Graphics API
API stands for Application Programming Interface. Basically a set of conventions / standards, compute engineers have come up with to write the software into. We need to pick sides here.
Choosing a graphics API to base our software upon is one of the most fundamental design we are going to make. For all practical purpose (read sunk man-month reasons) once we choose an API we will be “stuck” with it forever. This is one of the topics where I intentionally choose Performance over Development velocity. We could speed up software development by choosing a ready built engines such as open source ImGUI, GoDot, QT etc. Though, “engines” isolate the software from underlying APIs, we may get constrained by the engine itself at some point in future. We rule out closed source engines such as Unity and Unreal Engine for political reasons ! Fun Fact: This attitude is sometimes called NIH Syndrome i.e. Not-Invented-Here Syndrome. ;) So coming back to lower level APIs, we have limited APIs on each of the Operating Systems.
On windows, we have DirectX 9 / 10 / 11 / 12, OpenGL and Vulkan. OpenGL has been deprecated long back and newer graphics features such as Ray Tracing aren’t supported by it. Vulkan is generally a 2nd class citizen in windows compared to DirectX. Hence we choose the most modern flavor DirectX12. Remember, DirectX12 itself was 1st released in 2014. Hence setting it as a baseline requirement for our software is a reasonable decision. Hence DirectX12 is our ONLY graphics API for Windows Operating System. We support Windows 10 and 11 both for now (2025). This covers perhaps 90% of our target worldwide users. We also presume support of Heap_Tier_2 inside DirectX12. Note: Heap_Tier_2 started appearing in 2015/2016 timeline. What ShaderModel Level ? To be figured out. If you are feeling over-hyped to get deep down, read the 1st ( of 4 ) tutorial on DirectX12 here. It is ~100 pages !
Next most “market-share” operating system is MacOS on Apple Devices. In Apple world, Metal APIs are the only recommended ( non-deprecated ) APIs, hence we go with Metal. Even Vulkan works though a translation layer such as MoltenVK etc. Still for performance and 1st party support, we choose Metal API. Mac Graphics / Metal API shall also be partially reusable on iPhone / iPad devices, since they also have Metal as the preferred API.
Next up is Linux ( Ubuntu ) Operating System. This being open source operating system, open standard Vulkan is preferred here. We want our software to be available on even free operating systems. Hence we must have a Vulkan based US as well. Another reason for keeping this Vulkan interface is due to overlap with Android Mobile Operating System. For Android Phones, we have only 2 options, deprecated OpenGL or modern Vulkan. Hence we choose Vulkan. The within last 10 year version ! i.e. Vulkan 1.1.
Above 3 APIs are for desktop application. Next up is Brower based engine. Here upcoming ( as on 2025) API named WebGPU is chosen-one. This is supported by all major web-browser vendors i.e. Google Chrome, Apple Safari and Mozilla.
Having made above decisions, we have to be realistic about our core-engineering-degree-holder software developers. We can’t expect a chemical / civil / electrical / instrumentation / mechanical background people/developers to be familiar with such deep computer science concepts. Hence we structure our code in sort of mini-engine (NIH?), where adding a new UI element doesn’t involve fiddling deep down in graphics APIs. This will be sorted out progressively as our software matures.
Our software installer will verify that all the relevant APIs are present on the system, before installation. So this way, inside application, we don’t check every time whether a particular feature is supported by available hardware. Unless the initial installed-hardware itself changes. By default this check shouldn’t take more than a few micro-seconds during application startups.
More Graphics design decisions as specified in our Source Code !
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2
3/*
4Windows Desktop C++ DirectX12 application for CAD / CAM use.
5This file is our Architecture . Primitive data structures common to all platforms may be added here.
6
7At startup, pickup the GPU with highest VRAM. All rendering happens here only. Only 1 device supported for rendering.
8However OS may send the display frame to monitor connected to other / integrated GPU.
9
10VERTEX DETAILS:
11VertexLayout Common to all geometry:
123x4 Bytes for Position, 4 Bytes for Normal, 4 Bytes for Color RGBA / 8 Bytes if HDR Monitor present.
13 = 20 / 24 Bytes per vertex.
14Anyway go with 24 Bytes format ONLY. Tone mapping (HDR -> SDR) should happen in the Pixel Shader.
15Initial Development will be on R8G8B8A8, latter when we implement HDR, will will upgrade this.
16Some hardware may not support HDR, so keep both version of shaders.
17Further, wether to load HDR or SDR shaders is decided at the application startup times.
18If graphics card support HDR and there is at least 1 monitor present with HDR capability, sitch to HDR.
19Once HDR ON, the application maintains HDR shaders even if HDR monitors disconnects. Till app closes.
20
21Initially Hemispheric Ambient Lighting
22Factor = (Normal.z \times 0.5) + 0.5
23AmbientLight = Lerp(GroundColor, SkyColor, Factor)
24Screen Space Ambient Occlusion (SSAO) to darken creases and corners in future revision.
25
26All vertexes are positioned on object local space. World matrix applied in vertex shader.
27This enables, moving even 1000vertex objects just a 48 bytes world matrix update per oject.
28We use packed 48 bytes world matrix instead of 64 bytes to save bandwidth.
29Since last row is always 0,0,0,1, we can omit it. In shader, we reconstruct the last row.
30
31Separate render threads (1 per monitor) and single Copy thread. Copy thread is the ringmaster of VRAM!
32Separate render threads per monitor are in VSync with monitors unique refresh rate.
33Here separate render queue per monitor.
34
35We use ExecuteIndirect command with start vertex location instead of DrawIndexedInstanced per object.
36I want per tab VRAM isolation, each tab will be completely separate.
37Except for uncloseble tab 0 which stores common textures and UI elements.
38
39To support 100s of simultaneous tab, we start with small heap say 4MB per tab and grow heap size only when necessary.
40Each page could be a mixture of various geometry types. Say Cylinders, Cubes, I beams etc.
41Instead of allocating 1 giant 256MB buffer. Don't manually destroy heaps on tab switch. Use Evict.
42It allows the OS to handle the caching.
43If the user clicks back to a heavy tab, MakeResident is faster than re-creating heaps. Tab 0 is always resident.
44Eviction happens with a time lag of few seconds.
45Advanced system memory budget based eviction strategy after rest of spec implemented.
46
47Each page will be accompanied by a corresponding ExecuteIndirect argument buffer.
48Each TAB will also have it's dedicated World Matrix buffer.
49When we defragment a page, we must simultaneously rebuild its corresponding Argument Buffer.
50
51LOCK FREE VRAM MANAGEMENT:
52We will now be using ExecuteIndirect command + versioned geometry pages. Page max size 4 MB (initially).
53On geometry modify (Add / Modify / Delete)
54If the amount of new geometry (+ filled up last active page) is more than 4 MB page threshold, create new pages.
55Do not touch existing ones. And then publish. Otherwise:
56Allocate new page. By Copy Queue. Copy Queue makes a READ-ONLY operation (allowed) on existing page to create a newPage.
57This newPage is not published for rendering it.
58DirectQueue0, DirectQueue1, DirectQueue2 and so on can keep rendering as usual. Leave oldPage in COMMON state permanently.
59Never explicitly transition it to VERTEX.
60Render / CopySource both allowed on respective Queue by implicit promotion of COMMON stage.
61CopyQueue: Finalize this VRAM copied newPage as required. Upload delta. For additions, just add,
62for modify / delete, if page free space < threshold → rebuild /defragment page.
63Publish pointer swap atomically. Once all render threads have passed a fence, Retire old page later by releasing buffers.
64
65Geometry will NOT be stored in CPU RAM once it is uploaded to VRAM due to memory efficiency reason.
66Keeping memory scarcity on iGPU systems!
67They are generated on demand by engineering thread and simply handed over to copy queue.
68However to be able to defragment, copy queue stores the Byte/Index ranges of all objects loaded into a Page.
69
70Copy queue prepares newPage ( VertexBuffer, IndexBuffer, ExecuteIndirec Buffers) and uploads it to VRAM.
71This (PCIe transfer) happens in parallel while the other render threads are already running.
72So, iteration over all objects has been removed altogether from the engine.
73Further, there are 2 level of batching. Engineering thread will batch the changes together to some extent,
74and copy thread will also batch the changes by emptying the submission list.
75
76There will be multiple render threads running at different VSync refresh rates (say 1 at 60 Hz, 1 at 144 Hz, 1 at 30 Hz).
77Each monitor has its own render thread, command queue / allocator / commandlist.
78
79Geometry pages: • Created in COMMON • Never explicitly transitioned • Only used in read-only states
80• Copied from (COPY_SOURCE) • Copied into (COPY_DEST) only before publishing. Once published,
81there is no write operation on it. •Drawn from (VERTEX / INDEX / INDIRECT)
82
83Our strict invariants: • Geometry pages are immutable after publish.• No explicit state transitions for page buffers.
84• All page swaps are atomic. • Old pages are destroyed only after all queues retire.
85
86There will be multiple views per tab.
87Each View will maintain a pair ( double buffered ) of ExecuteIndirect command buffer.
88When an object is deleted, copy thread receive command from engineering thread.
89Copy thread than update the next double buffer and record the hole in Vertex/index buffer.
90Except for currently filling head buffer,
91
92Maintain a Free-List Allocator (e.g., a Segregated Free List) on the CPU. Per Tab.
93The Allocator knows: "I have a 12KB middle gap in Page 3, and a 40KB middle gap in Page 8."
94When a 10KB request comes in, the Allocator immediately returns "Page 3". No iterating through Page objects.
95If freelist says none of existing pages can accommodate new geometry, than create new heap/placed resource buffer.
96Free list does not track internal holes created from deleting objects.
97Only middle empty space. Aggregate holes are tracked per page. Defragmented occasionally.
98
99When a buffer gets >25% holes, it does creates a new defragmented buffer, once complete, switches over to new buffer.
100For new geometry addition. Maximum 1 buffer is defragmented at a time (between 2 frames). Since max page size is 64MB,
101This will not produce high latency stall during async with copy thread.
102
103Root Signature puts the "Constants" (View/Proj matrix) in root constants or a very fast descriptor table,
104as these don't change between pages. Only the VBV/IBV and the EI Argument Buffer change per batch/page.
105
106OBJECT REPRESENTATION:
107Here is the realistic "Worst Case" Hierarchy for a CAD Frame:
108• Index Depth: 16-bit vs 32-bit (Hardware Requirement) Examples: Nuts/Bolts (16) vs Engine Blocks (32)
109• Transparency: Opaque vs Transparent (Sorting Requirement). Transparent objects must be drawn last for alpha blending.
110• Topology: Triangles (Solid) vs Lines (Wireframe) (PSO Requirement).
111 We cannot draw lines and triangles in the same call.
112• Culling: Single-Sided vs Double-Sided (PSO Requirement) . Sheet metal vs Solids.
113 Since section is a common use case, perhaps we could have all geometry double sided. To be ascertained latter.
114• Buffer Pages (N): How many 256MB blocks you are using.
115Total Unique Batches = 2 x 2 x 2 x 2 x N = 16 x N
116This will ensure no pipeline state reset while rendering single Page. ExecuteIndirect call for every Page.
117
118To be clarified latter: How do we handle repeat geometry ? Say bolts.
119They will only need set of vertex/index buffers. We can draw them with different world matrices.
120
121NORMALS:
122The industry standard solution for Normals is not 16-bit floats, but Packed 10-bit Integers.
123We use the format: DXGI_FORMAT_R10G10B10A2_UNORM.
124X: 10 bits (0 to 1023), Y: 10 bits (0 to 1023), Z: 10 bits (0 to 1023), Padding: 2 bits (unused)
125Total: 32 bits (4 Bytes). Why this is perfect for Normals:
126Size: It is 3x smaller than 12-byte normal. (4 bytes vs 12 bytes). Precision: 10 bits gives us 2^{10} = 1024 steps.
127Since normals are always between -1.0 and 1.0, this gives you a precision of roughly 0.002.
128This is visually indistinguishable from 32-bit floats for lighting, even in high-end CAD.
129Vertex Shader Normalization: Normal = Input.Normal * 2.0 - 1.0.
130
131PAGE STRUCTURE:
132Vertex and Index buffer in same Page : superior architectural choice for three reasons:
133Halves the Allocation Overhead: We only manage 1 heap/resource per 4MB page instead of 2.
134Cache Locality: When the GPU fetches a mesh, the vertices and indices are physically close in VRAM (same memory page).
135This can slightly improve cache hit rates.
136Vertices start at Offset 0 and grow UP. Indices start at Offset Max (4MB) and grow DOWN.
137Free Space is always the gap in the middle. Page Full when Vertex_Head_Ptr meets or crosses Index_Tail_Ptr.
13864 Bytes mandatory gap in middle to address alignment concerns.
139
140Lazy Creation.
141When a user creates a new Tab, allocated memory = 0 MB.
142User draws a Bolt (Solid): Allocate Solid_Page_0 (4MB).
143User draws a Glass Window: Allocate Transparent_Page_0 (4MB).
144User never draws a Wireframe: Wireframe_Page remains null.
145
146Resource state is together. i.e. D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER | D3D12_RESOURCE_STATE_INDEX_BUFFER
147Feature Decision Benefit
148Page Content Single Type Only Zero PSO switching during Draw.
149Growth Logic Chained Doubling 4->8->16->32->64. No moving old data.
150Max Page Size 64 MB Prevents fragmentation failure on low-VRAM GPUs.
151Allocation Lazy (On Demand) Keeps "Hello World" tabs lightweight.
152Sub-Allocation Double-Ended Stack Maximizes usage for varying ratio of Vertex/Index Buffers.
153
154New geometry is appended (in the middle ) only if both new vertex and index buffers fit inside.
155Otherwise allocate new buffer. Copy thread also does batching.
156It aggregates all(who fit in current buffer) objects coming from engineering thread into single GPU upload.
157The Copy Thread should consume batches of updates,
158coalescing them into single ExecuteCommandList calls where possible to reduce API overhead.
159
160"Big Buffer" fallback. If Allocation_Size > Max_Page_Size,
161allocate a dedicated Committed Resource just for that object, bypassing the paging system.
162Handles large STL. or terrain map. Treat "Big Buffers" as a special Page Type. Add a "Large Object List" to your loop.
163Do not try to jam them into the standard EI logic if they require unique resource bindings per object.
1641 separate draw command for such Jumbo objects.
165
166Create a separate std::vector<BigObject> in Tab structure. Rendering:
167Loop through Pages (ExecuteIndirect).
168Loop through BigObjects (Standard DrawIndexedInstanced or EI with count 1).
169
170Defragmentation Logic:
171Copy queue marks the page for defragmentation. All frames of that tab freeze. Keep presenting previous render output.
172Any 1 of the rendering thread/queue reads the mark, Transition the resource to Common. Signal a fence.
173Copy queue picks it up , once defragmented, return the new resource.
174I am willing to accept the freeze of few frames on screen.
175This is a recognised engineering tradeoff. Acceptable to CAD users.
176
177EI Argument Buffers tightly coupled to the Memory Pages.
178When we defragment a Page, we must simultaneously rebuild its corresponding Argument Buffer.
179Do not try to "patch" the Argument buffer; regenerate it for that Page.
180
181Growth Logic: Similar to above defragmentation. How does my copy queue handle async ( without blocking render thread?)
182addition of 1 small geometry say 10kb to already existing 64MB heap out of which 50MB is filled up.
183All Views/frames of that particular tab freeze. However other tabs being handled by render thread keep processing.
184No thread stall. Transition that page to copy destination. Copy new data.
185Transition back to render status for render thread to pick up.
186
187FREEZE LOGIC:
188RenderToTexture to implement frame freeze since swap chain is FLIP_DISCARD.
189Side benefits? HDR handling. UI composition. Multi-monitor flexibility. Eviction safety. Clean defrag freezes
190
191Known Issues / Limitations (to be resolved in latter revision):
192Transparency sorting. accepting imperfect sorting for "Glass" pages during rotation,
193 and doing a CPU Sort + Args Rebuild only when the camera stops.
194Hot page for object drag / active mutation.
195Evict logic.
196Comput shader frustum culling.
197Telemetry. Per-tab VRAM usage graphs. Page fragmentation heatmap. Eviction frequency counters.
198 Copy queue stall tracking.
199Selection Highlighter methodology.
200Mesh Shader on supported hardware (RTX2000 onwards, RX6000 onwards).
201Instanced based LOD optimization . Optionally using compute shader.
202
203Miscellaneous Specification:
204There will be a uniform object ID ( 64 bit ) unique across all objects across entire process memory.
205Each object can have up-to 16? different simultaneous variations of vertex geometry / graphics representation.
206We am expecting 1000 to 5000 draw calls per frame ?
207How should I handle multiple partially overlapping windows?
208Each windows can be independently resized or maximized / minimized.
209Lowest distance between object and ALL the different view camera position shall be used by logic threads,
210 to decided the Level of Detail.
211It will have some mechanism to manage memory over pressure.
212To signal the logic threads to reduce the level of detail within some distance.
213Our GPU Memory manager will be a singleton. There will be only 1 instance of that class managing entire GPU memory.
214
215Consider a Desktop PC. It has 2 discrete graphics card and 1 integrated graphics card.
2161 Monitor is connected and active to each of these 3 devices.
217We can use exactly 1 device for rendering for all monitor!
218Windows 10/11 WDDM supports heterogeneous multi-adapter. When window moves: DWM composites surfaces.
219Frame copied across adapters if needed. This works but is slow since all frames need to traverse PCIe bus.
220
221TO-DO LIST : As things get completed,
222 they will be removed from this pending list and get incorporated appropriately in design document.
223
224Phase 1: The Visual Baseline (Get these out of the way)
225[Done] Update Vertex format to include Normals. (Required for lighting).
226[Done] Hemispherical Lighting in shader. (Verify normals are correct).
227[Done] Mouse Zoom/Pan/Rotate (Basic).
228
229Phase 2: The "Freeze" Infrastructure
230Before you break the memory model, build the mechanism that hides the breakage.
231[Done] Render To Texture (RTT) & Full-Screen Quad. Goal: Detach the "Drawing" from the "Presenting."
232
233Phase 3: The API Pivot (The Hardest Part)
234Switching to ExecuteIndirect changes how you pass data. Do this BEFORE implementing custom heaps to isolate variables.
235[Done] Implement Structured Buffer for World Matrix. StructuredBuffer<float4x4> and a root constant index.
236We cannot do ExecuteIndirect for multiple objects without a way to tell the shader which object is being drawn.
237[Done] DrawIndexedInstanced → ExecuteIndirect (EI).
238Advice: Implement this using your current committed resources first. Just get the API call working.
239
240Phase 4: The Memory Manager (The "Vishwakarma" Core)
241Now that EI is working, replace the backing memory.
242[ ] [MISSING] Global Upload Ring Buffer.
243Critical: Copy thread needs a staging area. If we don't build this,
244our "VRAM Pages" step will stall waiting on CreateCommittedResource for uploads.
245[Done] VRAM Pages per Tab (The Stack Allocator). Advice: Implement the "Double-Ended Stack" (Vertex Up, Index Down) here.
246[ ] CPU-Side Free List Allocator. (The logic that tracks the holes).
247[ ] Tab Management / View Management. (Integrating the heaps into the UI).
248[ ] Basic Ribbon UI.
249
250Phase 5: Advanced Features & Polish
251[ ] Migrated to Shader Model 6.
252[ ] VRAM Defragmentation. (Now safe to implement because RTT exists).
253[ ] Click Selection / Window Selection. (Requires Raycasting against your CPU Free List/Data structures).
254[ ] Instanced optimization for Pipes.
255[ ] SSAO.
256[ ] Upgrade Vertices to HDR + Tonemapping.
257[ ] Transparency Sorting. (CPU Sort + Args Rebuild when camera stops moving).
258
259Phase 6: Performance & Telemetry
260[ ] Per-Tab VRAM Usage Graphs. (Helps identify memory leaks or inefficient usage).
261[ ] Page Fragmentation Heatmap. (Visualize which pages are most fragmented).
262[ ] Eviction Frequency Counters. (Track how often eviction occurs and its impact on performance).
263[ ] Copy Queue Stall Tracking. (Identify bottlenecks in the copy thread).
264
265Phase 7: Extreme performance optimizations (Only after all above is done and stable)
266[ ] LOD Optimization. (Using instancing or compute shaders to manage levels of detail based on camera distance).
267[ ] Compute Shader Frustum Culling. (To reduce the number of objects sent to the GPU).
268[ ] Mesh Shader Implementation. (For supported hardware, to further reduce draw call overhead). (Only for pipes)
269[ ] GPU-Based Defragmentation. (Offload defragmentation to the GPU to minimize CPU stalls).
270[ ] Asynchronous Resource Creation. (Use D3D12's async resource creation to further reduce stalls
271 during heap growth or defragmentation).
272[ ] Page Level optimization : Static pages → single draw, Semi-dynamic pages → EI ,
273 Highly dynamic pages → EI + GPU compaction
274
275Not to do list:
276Multi-GPU Rendering. (Too complex for initial implementation, and Windows' multi-adapter support is limited).
277Face-wise Geometry colors. (Implementation detail). Maybe necessary for future mechanical parts.
278
279*/
280
281#pragma once
282#include <DirectXMath.h>
283
284struct CameraState { // Each view gets its own camera state.
285 //This is part of the "View" data structure, not the "Tab" data structure. Each tab can have multiple views.
286 DirectX::XMFLOAT3 position;
287 DirectX::XMFLOAT3 target;
288 DirectX::XMFLOAT3 up;
289 float fov;
290 float aspect;
291 float nearZ;
292 float farZ;
293
294 CameraState() { Initialize(); }
295 void Initialize() {
296 position = { 0.0f, -10.0f, 2.0f };
297 target = { 0.0f, 0.0f, 0.0f };
298 up = { 0.0f, 0.0f, 1.0f }; // Z-Up is perfect for an XY orbit.
299
300 fov = DirectX::XMConvertToRadians(60.0f);
301 aspect = 1.0f; // SAFE DEFAULT
302 nearZ = 0.1f;
303 farZ = 1000.0f;
304 }
305};
306
307inline void UpdateCameraOrbit(CameraState& cam)
308{
309 static float rotationAngle = 0.0f; // Remove static when implementing tab UI.
310 rotationAngle += 0.002f; // per-frame speed
311
312 // Calculate the 2D radius from the target on the XY plane. We ignore Z here to prevent the "spiral away" bug.
313 float dx = cam.position.x - cam.target.x;
314 float dy = cam.position.y - cam.target.y;
315 float radius = hypotf(dx, dy);
316 if (radius < 0.001f) radius = 10.0f;// Safety check to prevent radius becoming 0 (which locks the camera)
317
318 float x = cam.target.x + cosf(rotationAngle) * radius; // Orbit in XY plane
319 float y = cam.target.y + sinf(rotationAngle) * radius;
320 float z = cam.position.z;// Z remains static (height)
321 cam.position = { x, y, z };
322}
Actual Code of our graphics engine.
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2#pragma once
3
4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
7// Expect to read at least 2 times before you start grasping it !
8
9//Tell the HLSL compiler to include debug information into the shader blob.
10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
11#define WIN32_LEAN_AND_MEAN
12#include <windows.h> // MUST be before d3d12.h
13#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
14//helper structures Library. MIT Licensed. Added to the project as git submodule.
15//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
16#include <d3dx12.h>
17#include <dxgi1_6.h>
18#include <dxgidebug.h>
19#include <wrl.h>
20#include <d3dcompiler.h>
21#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
22#include <vector>
23#include <string>
24#include <unordered_map>
25#include <random>
26#include <ctime>
27#include <iostream>
28#include <thread>
29#include <chrono>
30#include <map>
31#include <list>
32
33#include "ConstantsApplication.h"
34#include "MemoryManagerGPU.h"
35#include "UserInterface-DirectX12.h"
36#include "डेटा.h"
37
38using namespace Microsoft::WRL;
39
40//DirectX12 Libraries.
41#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
42#pragma comment(lib, "dxgi.lib")
43#pragma comment(lib, "d3dcompiler.lib")
44#pragma comment(lib, "dxguid.lib")
45
46/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
47exceeds frame refresh interval, than strutting distortion will appear. However
48we low input latency outweighs the slight frame smoothness of triple buffering.
49Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
50const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
51
52// Constants
53constexpr UINT64 MaxVertexBufferSize = 1024 * 1024 * 64; // 64 MB
54constexpr UINT64 MaxIndexBufferSize = 1024 * 1024 * 16; // 16 MB
55
56// Represents complete geometry and index data associated with 1 engineering object..
57// This structure holds information about a resource allocated in GPU memory (VRAM)
58struct GpuResourceVertexIndexInfo {
59 ComPtr<ID3D12Resource> vertexBuffer;
60 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
61 ComPtr<ID3D12Resource> indexBuffer;
62 D3D12_INDEX_BUFFER_VIEW indexBufferView;
63 UINT indexCount;
64 uint32_t matrixIndex = 0;
65
66 //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
67 // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
68 //uint64_t vramOffset; // Simulated VRAM address
69 //uint64_t size;
70 // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
71};
72
73struct IndirectCommand { // OPTIMIZED Indirect Command
74 uint32_t matrixIndex; // 4 Bytes (Root Constant b1)
75 // Since Jumbo buffer ( or pages in future ) remains same, we bind it once.
76 // REMOVED: D3D12_VERTEX_BUFFER_VIEW vbv (Saved 16 Bytes)
77 // REMOVED: D3D12_INDEX_BUFFER_VIEW ibv (Saved 16 Bytes)
78 D3D12_DRAW_INDEXED_ARGUMENTS drawArguments;// 20 Bytes
79}; // Total size: 24 Bytes (down from 56 Bytes!)
80static_assert(sizeof(IndirectCommand) == 24, "IndirectCommand must be exactly 24 bytes.");
81
82/* Page Metadata: GeometryPlacementRecordInPage (CPU-side only).
83One entry per geometry object inside a GeometryPage. Used by Copy Thread for defragmentation,
84rebuilds, and future features. (frustum culling, ray-cast selection, LOD, etc.).
85Total size = 56 bytes (tightly packed, cache-friendly). */
86struct GeometryPlacementRecordInPage {
87 uint64_t objectID; // Unique 64-bit ID across entire process (unchanged)
88
89 // Byte offsets into this page's vertex/index buffers (page max = 4 MB → uint32_t is safe)
90 // Vertex region (grows upward)
91 uint32_t vertexByteOffset; // Start of this object's vertices in the page (bytes)
92 uint32_t vertexSize; // In bytes
93
94 // Index region (grows downward)
95 uint32_t indexByteOffset; // Start of this object's indices in the page (bytes)
96 uint32_t indexSize; // In bytes
97
98 uint32_t indexCount; // Number of indices (not bytes) For ExecuteIndirect
99 uint32_t matrixIndex; // Index into the per-tab WorldMatrix structured buffer
100
101 // Axis-Aligned Bounding Box (AABB) – stored as float32 only (24 bytes total)
102 // Always present for future use (frustum culling, selection, etc.).
103 // Set to {0,0,0} / {0,0,0} if we don't need it yet – costs nothing extra.
104 float minX, minY, minZ, maxX, maxY, maxZ; // Minimum corner (X,Y,Z) Maximum corner (X,Y,Z)
105
106 // Optional padding for perfect 8-byte alignment (not needed – compiler will pad anyway)
107 bool isDeleted = false; // Marked for deletion (soft delete, for defragmentation)
108};
109
110static_assert(sizeof(GeometryPlacementRecordInPage) == 64,
111 "GeometryPlacementRecordInPage must be exactly 64 bytes for optimal cache/line usage.");
112
113struct GeometryPage {
114 // GPU RESOURCES. Single unified 4 MB buffer
115 Microsoft::WRL::ComPtr<ID3D12Resource> buffer;// Layout:[Vertex Region ↑ ][Free Space][ Index Region ↓ ]
116 Microsoft::WRL::ComPtr<ID3D12Resource> indirectBuffer;// ExecuteIndirect argument buffer for this page
117 uint32_t indirectCount = 0; // Number of valid indirect draw commands
118
119 // ALLOCATION STATE (CPU-side only)
120 uint32_t vertexHead = 0; // Vertex region grows upward from 0
121 // Index region grows downward from pageSize
122 uint32_t indexTail = 0; // Initialized to pageSize
123 uint32_t pageSize = 0; // Typically 4 * 1024 * 1024
124 static constexpr uint32_t SAFETY_GAP = 64; // alignment guard
125
126 // FRAGMENTATION TRACKING
127 uint32_t liveBytes = 0; // Actively used bytes
128 uint32_t holeBytes = 0; // Deleted object space
129 uint32_t objectCount = 0; // Active objects
130
131 // VERSIONING & LIFETIME CONTROL
132 uint32_t version = 0; // Incremented on rebuild
133 std::atomic<bool> published = false; // Immutable once true
134 uint64_t retireFence = 0; // Fence value after which this page is safe to destroy
135
136 std::vector<GeometryPlacementRecordInPage> objects; // CPU METADATA (NO GEOMETRY STORED)
137
138 // UTILITY
139 bool IsFull(uint32_t incomingVertexBytes, uint32_t incomingIndexBytes) const {
140 //If: incomingIndexBytes > indexTail then : indexTail - incomingIndexBytes wraps to huge value.
141 if (incomingIndexBytes > indexTail) return true;
142 uint32_t alignedVertexHead = AlignUp(vertexHead, 16);
143 uint32_t alignedIndexTail = AlignDown(indexTail - incomingIndexBytes, 4);
144 return (alignedVertexHead + incomingVertexBytes + SAFETY_GAP > alignedIndexTail);
145 }
146
147 static uint32_t AlignUp(uint32_t value, uint32_t alignment) {
148 return (value + alignment - 1) & ~(alignment - 1);
149 }
150
151 static uint32_t AlignDown(uint32_t value, uint32_t alignment) {
152 return value & ~(alignment - 1);
153 }
154};
155
156struct BigGeometryObject {
157 Microsoft::WRL::ComPtr<ID3D12Resource> buffer;
158 Microsoft::WRL::ComPtr<ID3D12Resource> indirectBuffer;
159 uint32_t indexCount = 0;
160 uint32_t matrixIndex = 0;
161 uint64_t retireFence = 0;
162 std::atomic<bool> published = false;
163};
164
165struct GeometryPageSnapshot {// A lightweight, immutable snapshot of the current pages.
166 // We use raw pointers here because the Render thread only needs to observe them.
167 // Iterating over a contiguous array of pointers is extremely cache-friendly.
168 std::vector<GeometryPage*> pages;
169};
170
171struct TabGeometryStorage {
172 // THE RCU POINTER: Render threads read this, Copy thread writes to it.
173 std::atomic<GeometryPageSnapshot*> activeSnapshot{ nullptr };
174 // WRITER-ONLY STATE: Only the Copy thread touches these, so they need no locks/atomics.
175 std::vector<std::unique_ptr<GeometryPage>> activePages; // Actually owns the memory
176
177 // Cleanup queues for the Copy thread
178 struct RetiredSnapshot { GeometryPageSnapshot* snapshot; uint64_t retireFence; };
179 struct RetiredPage { std::unique_ptr<GeometryPage> page; uint64_t retireFence; };
180 std::vector<RetiredSnapshot> retiredSnapshots;
181 std::vector<RetiredPage> retiredPages;
182
183 /* TODO: RCU version of all of the following vectors need to be developed. Only 1st done so far.
184 std::vector<std::unique_ptr<GeometryPage>> opaquePages; // Opaque geometry pages
185 std::vector<std::unique_ptr<GeometryPage>> transparentPages; // Transparent geometry pages
186 std::vector<std::unique_ptr<GeometryPage>> wireframePages; // Wireframe pages (if used)
187 std::vector<std::unique_ptr<BigGeometryObject>> bigObjects; // Dedicated large objects
188 std::atomic<uint32_t> currentVersion = 0;
189 std::vector<std::unique_ptr<GeometryPage>> retiredPages;
190 */
191};
192
193/* DirectX 12 resources are organized at 3 levels:
1941. The Data : Per Tab (Jumbo Buffers for geometry data, materials, textures,
195 Pipeline State Object, Root Signature, Command Signature etc.)
1962. The Target : Per Window (Swap Chain, Render Targets, Depth Stencil Buffer etc.)
1973. The Worker : Per Render Thread. 1 For each monitor. (Command Queue, Command List etc.
198 Resources shared across multiple windows on the same monitor) */
199
200struct DX12ResourcesPerTab { // (The Data) Geometry Data
201
202 // Upload Heaps (CPU -> GPU Transfer)
203 // Moved here because the Copy Thread writes to these when adding objects to the TAB.
204 ComPtr<ID3D12Resource> vertexBufferUpload;
205 ComPtr<ID3D12Resource> indexBufferUpload;
206
207 // Persistent Mapped Pointers (CPU Address)
208 UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
209 UINT8* pIndexDataBegin = nullptr; // Pointer for mapped index upload buffer
210
211 // TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
212 ComPtr<ID3D12DescriptorHeap> srvHeap;
213
214 mutable std::mutex objectsOnGPUMutex;// Make mutex mutable so const references can lock it in rendering paths.
215 // Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
216 std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
217
218 //Copy thread owns/writes following variables exclusively. Render threads only read it. Without Lock.
219 ComPtr<ID3D12Resource> worldMatrixBuffer; // TODO: Doublebuffer it per frame.
220 UINT8 * pWorldMatrixDataBegin = nullptr;
221 uint32_t matrixCapacity = 4096;
222 uint32_t matrixCount = 0;
223 std::vector<uint32_t> freeMatrixSlots; // free-list for matrix indices.
224 //To enable re-use of slots when objects are removed.
225
226 // Initially rootSignature & pipelineState were in PerWindow, but now moved here,
227 // when adding commandSignature and indirect drawing infrastructure.
228 // Since Root Signature and Pipeline State are closely tied to the command signature,
229 ComPtr<ID3D12RootSignature> rootSignature;
230 ComPtr<ID3D12PipelineState> pipelineState;
231
232 ComPtr<ID3D12CommandSignature> commandSignature;// Indirect Drawing
233
234 CameraState camera; //Reference is updated per frame.
235 //Currently per tab, but latter we will have this per view. Since each tab can have multiple views.
236};
237
238struct DX12ResourcesPerWindow {// Presentation Logic
239 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
240 int WindowHeight = 600;
241 ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with.
242 //To assist with migrations.
243
244 ComPtr<IDXGISwapChain3> swapChain; // The link to the OS Window
245 //ComPtr<ID3D12CommandQueue> commandQueue; // Moved to OneMonitorController
246 ComPtr<ID3D12DescriptorHeap> rtvHeap;
247 ComPtr<ID3D12Resource> renderTargets[FRAMES_PER_RENDERTARGETS];
248
249 // Render To Texture Infrastructure
250 ComPtr<ID3D12Resource> renderTextures[FRAMES_PER_RENDERTARGETS];
251 ComPtr<ID3D12DescriptorHeap> rttRtvHeap;
252 ComPtr<ID3D12DescriptorHeap> rttSrvHeap;
253
254 // TODO: When we will implement HDR support, we wil have change above format to following.
255 //DXGI_FORMAT rttFormat = DXGI_FORMAT_R16G16B16A16_FLOAT; // HDR ready
256
257 ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
258 ComPtr<ID3D12DescriptorHeap> dsvHeap;
259
260 D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size).
261 D3D12_RECT scissorRect;
262
263 ComPtr<ID3D12Resource> constantBuffer;
264 ComPtr<ID3D12DescriptorHeap> cbvHeap;
265 UINT8* cbvDataBegin = nullptr;
266
267 UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread.
268 // It can change even during windows resize.
269};
270
271struct DX12ResourcesPerRenderThread { // This one is created 1 for each monitor.
272 // For convenience only. It simply points to OneMonitorController.commandQueue
273 ComPtr<ID3D12CommandQueue> commandQueue;
274
275 // Note that there are as many render thread as number of monitors attached.
276 // Command Allocators MUST be unique to the thread.
277 // We need one per frame-in-flight to avoid resetting while GPU is reading.
278 ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
279 UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
280
281 // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
282 ComPtr<ID3D12GraphicsCommandList> commandList;
283
284 // Synchronization (Per Window VSync)
285 HANDLE fenceEvent = nullptr;
286 ComPtr<ID3D12Fence> fence; // TODO: Discard this. use the fence inside monitor.
287};
288
289struct OneMonitorController { // Variables stored per monitor.
290 // System Fetched information.
291 bool isScreenInitalized = false;
292 int screenPixelWidth = 800;
293 int screenPixelHeight = 600;
294 int screenPhysicalWidth = 0; // in mm
295 int screenPhysicalHeight = 0; // in mm
296 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
297 int WindowHeight = 600;
298
299 HMONITOR hMonitor = NULL; // Monitor handle. Remains fixed as long as monitor is not disconnected / disabled.
300 std::wstring monitorName; // Monitor device name (e.g., "\\\\.\\DISPLAY1")
301 std::wstring friendlyName; // Human readable name (e.g., "Dell U2720Q")
302 RECT monitorRect; // Full monitor rectangle
303 RECT workAreaRect; // Work area (excluding task bar)
304 int dpiX = 96; // DPI X
305 int dpiY = 96; // DPI Y
306 double scaleFactor = 1.0; // Scale factor (100% = 1.0, 125% = 1.25, etc.)
307 bool isPrimary = false; // Is this the primary monitor?
308 DWORD orientation = DMDO_DEFAULT; // Monitor orientation
309 int refreshRate = 60; // Refresh rate in Hz
310 int colorDepth = 32; // Color depth in bits per pixel
311
312 bool isVirtualMonitor = false; // To support headless mode.
313
314 // DirectX12 Resources.
315 // TODO: Move these to per render thread structure.
316 ComPtr<ID3D12CommandQueue> commandQueue; // Persistent. Survives thread restarts.
317 bool hasActiveThread = false;// We need to know if this specific monitor is currently being serviced by a thread
318 ComPtr<ID3D12Fence> renderFence; // Signalled each frame by GpuRenderThread
319 uint64_t renderFenceValue = 0; // Last value signalled (written by render thread)
320 // Above is intentionally NOT std::atomic since gpu.renderFenceValue is the std::atomic serving all monitors.
321 HANDLE renderFenceEvent = nullptr;
322};
323
324// Commands sent from Generator thread(s) to the Copy thread
325enum class CommandToCopyThreadType { NONE = 0, ADD, MODIFY, REMOVE };
326struct CommandToCopyThread
327{
328 CommandToCopyThreadType type;
329 std::optional<GeometryData> geometry; // Present for ADD and MODIFY
330 uint64_t id = 0; // Always present
331 uint64_t tabID = 0; // NEW: We must know which tab this object belongs to!
332};
333
334extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
335
336// Packet of work for a Render Thread for one frame
337struct RenderPacket {
338 uint64_t frameNumber;
339 std::vector<uint64_t> visibleObjectIds;
340};
341
342class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
343{
344public:
345 HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
346 HRESULT Error() const { return hr; }
347private:
348 const HRESULT hr;
349};
350
351inline void ThrowIfFailed(HRESULT hr) {
352 if (FAILED(hr)) { throw HrException(hr); }
353}
354
355
356class ThreadSafeQueueGPU {
357public:
358 void push(CommandToCopyThread value) {
359 std::lock_guard<std::mutex> lock(mutex);
360 fifoQueue.push(std::move(value));
361 cond.notify_one();
362 }
363
364 // Non-blocking pop
365 bool try_pop(CommandToCopyThread& value) {
366 std::lock_guard<std::mutex> lock(mutex);
367 if (fifoQueue.empty()) { return false; }
368 value = std::move(fifoQueue.front());
369 fifoQueue.pop();
370 return true;
371 }
372
373 // Shuts down the queue, waking up any waiting threads
374 void shutdownQueue() {
375 std::lock_guard<std::mutex> lock(mutex);
376 shutdown = true;
377 cond.notify_all();
378 }
379
380private:
381 std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
382 std::mutex mutex;
383 std::condition_variable cond;
384 bool shutdown = false;
385};
386
387inline ThreadSafeQueueGPU g_gpuCommandQueue;
388
389// VRAM Manager : This class handles the GPU memory dynamically.
390// There will be exactly 1 object of this class in entire application. Hence the special name.
391// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
392class शंकर {
393public:
394 OneMonitorController screens[MV_MAX_MONITORS];
395 int currentMonitorCount = 0; // Global monitor count. It can be 0 when no monitors are found (headless mode)
396
397 // IDXGIFactory6 / IDXGIAdapter4 Prerequisite : Windows 10 1803+ / Windows 11
398 ComPtr<IDXGIFactory6> factory6; //The OS-level display system manager. Can iterate over GPUs.
399 ComPtr<IDXGIAdapter4> hardwareAdapter;// Represents a physical GPU device.
400 //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
401
402 ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
403 bool isGPUEngineInitialized = false; //TODO: To be implemented.
404 DXGI_FORMAT rttFormat = DXGI_FORMAT_R8G8B8A8_UNORM;
405
406 DX12ResourcesUI uiResources;
407
408 //Following to be added latter.
409 //ID3D12DescriptorHeapMgr ← Global descriptor allocator
410 //Shader& PSO Cache ← Shared by all threads
411 //AdapterInfo ← For device selection / VRAM stats
412
413 /* We will have 1 Render Queue per monitor, which is local to Render Thread.
414 IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
415 Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
416 Still we need to have 4 separate queue to properly handle different refresh rate.
417
418 Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
419 waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand.
420 Window B cannot be processed because the Queue is blocked by Windows A's VSync wait.
421 By using 4 Queues, Queue A can sit blocked waiting for VSync,
422 while Queue B immediately push work work to the GPU for the faster monitor.*/
423
424 std::atomic<uint64_t> renderFenceValue = 0; // Global. This is in addition to per monitor render fence value.
425
426 ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
427 ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
428 std::atomic<uint64_t> copyFenceValue = 1; // thread safe.
429 //Start from 1 to avoid confusion with default fence value of 0.
430 HANDLE copyFenceEvent = nullptr;
431
432public:
433 // Maps our CPU ObjectID to its resource info in VRAM
434 std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
435
436 // Simulates a simple heap allocator with 16MB chunks
437 uint64_t m_nextFreeOffset = 0;
438 const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
439 uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
440
441 // When an object is updated, the old VRAM is put here to be freed later.
442 struct DeferredFree {
443 uint64_t frameNumber; // The frame it became obsolete
444 GpuResourceVertexIndexInfo resource;
445 };
446 std::list<DeferredFree> deferredFreeQueue;
447
448 // Allocate space in VRAM. Returns the handle. What is this used for?
449 // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
450
451 // Descriptor sizes for RTV and CBV/SRV/UAV. We need these to calculate offsets in descriptor heaps.
452 // These are initialized during device creation and remain constant. i.e. They are hardware properties of GPU.
453 // We store them here for easy access across threads.
454 UINT rtvDescriptorSize = 0, cbvSrvUavDescriptorSize = 0; //Initialized during device creation.
455
456 void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
457
458 //शंकर() {}; // Our Main function initializes DirectX12 global resources by calling InitD3DDeviceOnly().
459 void InitD3DDeviceOnly();
460 void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
461 void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
462 void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thread.
463 DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes, TabGeometryStorage& storage);
464 void WaitForPreviousFrame(const DX12ResourcesPerRenderThread& dx);
465 void ResizeD3DWindow(DX12ResourcesPerWindow& dx, UINT newWidth, UINT newHeight);
466
467 // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
468 void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
469 // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
470 void CleanupTabResources(DX12ResourcesPerTab& tabRes);
471 // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
472 // Thread resources are cleaned up by the Render Thread itself before exit.
473 void CleanupD3DGlobal();
474};
475
476void FetchAllMonitorDetails();
477BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
478
479/*
480IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
481retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
482COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
483
484Ex: IID_PPV_ARGS(&device) expands to following:
485IID iid = __uuidof(ID3D12Device);
486void** ppv = reinterpret_cast<void**>(&device);
487*/
488
489// Structure to hold transformation matrices
490struct ConstantBuffer {
491 DirectX::XMFLOAT4X4 viewProj; // 64 bytes
492};
493
494// Externs for communication
495extern std::atomic<bool> shutdownSignal;
496
497// Logic Thread "Fence"
498extern std::mutex g_logicFenceMutex;
499extern std::condition_variable g_logicFenceCV;
500extern uint64_t g_logicFrameCount;
501
502// Copy Thread "Fence"
503extern std::mutex g_copyFenceMutex;
504extern std::condition_variable g_copyFenceCV;
505extern uint64_t g_copyFrameCount;
506
507//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
508/*
509std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
510
511 if (nextFreeOffset + size > m_vram_capacity) {
512 std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
513 // Here, the Main Logic thread would be signaled to reduce LOD.
514 return std::nullopt;
515 }
516 GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
517 nextFreeOffset += size; // Simple bump allocator
518 return info;
519}*/
520
521// Utility Functions
522
523// Waits for the previous frame to complete rendering.
524inline void WaitForGpu(DX12ResourcesPerWindow dx)
525{ //Where are we using this function?
526 /*
527 dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
528 dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
529 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
530 dx.fenceValue++;*/
531}
532
533// Waits for a specific fence value to be reached
534inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
535{ // Where are we using this?
536 /*
537 if (dx.fence->GetCompletedValue() < fenceValue)
538 {
539 ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
540 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
541 }*/
542}
543
544// Thread Functions
545// Thread synchronization between Main Logic thread and Copy thread
546inline std::mutex toCopyThreadMutex;
547inline std::condition_variable toCopyThreadCV;
548inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
549
550// Thread Functions - Just Declaration!
551void GpuCopyThread();
552void GpuRenderThread(int monitorId, int refreshRate);
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2#pragma once
3
4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
7// Expect to read at least 2 times before you start grasping it !
8
9//Tell the HLSL compiler to include debug information into the shader blob.
10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
11#define WIN32_LEAN_AND_MEAN
12#include <windows.h> // MUST be before d3d12.h
13#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
14//helper structures Library. MIT Licensed. Added to the project as git submodule.
15//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
16#include <d3dx12.h>
17#include <dxgi1_6.h>
18#include <dxgidebug.h>
19#include <wrl.h>
20#include <d3dcompiler.h>
21#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
22#include <vector>
23#include <string>
24#include <unordered_map>
25#include <random>
26#include <ctime>
27#include <iostream>
28#include <thread>
29#include <chrono>
30#include <map>
31#include <list>
32
33#include "ConstantsApplication.h"
34#include "MemoryManagerGPU.h"
35#include "UserInterface-DirectX12.h"
36#include "डेटा.h"
37
38using namespace Microsoft::WRL;
39
40//DirectX12 Libraries.
41#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
42#pragma comment(lib, "dxgi.lib")
43#pragma comment(lib, "d3dcompiler.lib")
44#pragma comment(lib, "dxguid.lib")
45
46/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
47exceeds frame refresh interval, than strutting distortion will appear. However
48we low input latency outweighs the slight frame smoothness of triple buffering.
49Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
50const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
51
52// Constants
53constexpr UINT64 MaxVertexBufferSize = 1024 * 1024 * 64; // 64 MB
54constexpr UINT64 MaxIndexBufferSize = 1024 * 1024 * 16; // 16 MB
55
56// Represents complete geometry and index data associated with 1 engineering object..
57// This structure holds information about a resource allocated in GPU memory (VRAM)
58struct GpuResourceVertexIndexInfo {
59 ComPtr<ID3D12Resource> vertexBuffer;
60 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
61 ComPtr<ID3D12Resource> indexBuffer;
62 D3D12_INDEX_BUFFER_VIEW indexBufferView;
63 UINT indexCount;
64 uint32_t matrixIndex = 0;
65
66 //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
67 // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
68 //uint64_t vramOffset; // Simulated VRAM address
69 //uint64_t size;
70 // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
71};
72
73struct IndirectCommand { // OPTIMIZED Indirect Command
74 uint32_t matrixIndex; // 4 Bytes (Root Constant b1)
75 // Since Jumbo buffer ( or pages in future ) remains same, we bind it once.
76 // REMOVED: D3D12_VERTEX_BUFFER_VIEW vbv (Saved 16 Bytes)
77 // REMOVED: D3D12_INDEX_BUFFER_VIEW ibv (Saved 16 Bytes)
78 D3D12_DRAW_INDEXED_ARGUMENTS drawArguments;// 20 Bytes
79}; // Total size: 24 Bytes (down from 56 Bytes!)
80static_assert(sizeof(IndirectCommand) == 24, "IndirectCommand must be exactly 24 bytes.");
81
82/* Page Metadata: GeometryPlacementRecordInPage (CPU-side only).
83One entry per geometry object inside a GeometryPage. Used by Copy Thread for defragmentation,
84rebuilds, and future features. (frustum culling, ray-cast selection, LOD, etc.).
85Total size = 56 bytes (tightly packed, cache-friendly). */
86struct GeometryPlacementRecordInPage {
87 uint64_t objectID; // Unique 64-bit ID across entire process (unchanged)
88
89 // Byte offsets into this page's vertex/index buffers (page max = 4 MB → uint32_t is safe)
90 // Vertex region (grows upward)
91 uint32_t vertexByteOffset; // Start of this object's vertices in the page (bytes)
92 uint32_t vertexSize; // In bytes
93
94 // Index region (grows downward)
95 uint32_t indexByteOffset; // Start of this object's indices in the page (bytes)
96 uint32_t indexSize; // In bytes
97
98 uint32_t indexCount; // Number of indices (not bytes) For ExecuteIndirect
99 uint32_t matrixIndex; // Index into the per-tab WorldMatrix structured buffer
100
101 // Axis-Aligned Bounding Box (AABB) – stored as float32 only (24 bytes total)
102 // Always present for future use (frustum culling, selection, etc.).
103 // Set to {0,0,0} / {0,0,0} if we don't need it yet – costs nothing extra.
104 float minX, minY, minZ, maxX, maxY, maxZ; // Minimum corner (X,Y,Z) Maximum corner (X,Y,Z)
105
106 // Optional padding for perfect 8-byte alignment (not needed – compiler will pad anyway)
107 bool isDeleted = false; // Marked for deletion (soft delete, for defragmentation)
108};
109
110static_assert(sizeof(GeometryPlacementRecordInPage) == 64,
111 "GeometryPlacementRecordInPage must be exactly 64 bytes for optimal cache/line usage.");
112
113struct GeometryPage {
114 // GPU RESOURCES. Single unified 4 MB buffer
115 Microsoft::WRL::ComPtr<ID3D12Resource> buffer;// Layout:[Vertex Region ↑ ][Free Space][ Index Region ↓ ]
116 Microsoft::WRL::ComPtr<ID3D12Resource> indirectBuffer;// ExecuteIndirect argument buffer for this page
117 uint32_t indirectCount = 0; // Number of valid indirect draw commands
118
119 // ALLOCATION STATE (CPU-side only)
120 uint32_t vertexHead = 0; // Vertex region grows upward from 0
121 // Index region grows downward from pageSize
122 uint32_t indexTail = 0; // Initialized to pageSize
123 uint32_t pageSize = 0; // Typically 4 * 1024 * 1024
124 static constexpr uint32_t SAFETY_GAP = 64; // alignment guard
125
126 // FRAGMENTATION TRACKING
127 uint32_t liveBytes = 0; // Actively used bytes
128 uint32_t holeBytes = 0; // Deleted object space
129 uint32_t objectCount = 0; // Active objects
130
131 // VERSIONING & LIFETIME CONTROL
132 uint32_t version = 0; // Incremented on rebuild
133 std::atomic<bool> published = false; // Immutable once true
134 uint64_t retireFence = 0; // Fence value after which this page is safe to destroy
135
136 std::vector<GeometryPlacementRecordInPage> objects; // CPU METADATA (NO GEOMETRY STORED)
137
138 // UTILITY
139 bool IsFull(uint32_t incomingVertexBytes, uint32_t incomingIndexBytes) const {
140 //If: incomingIndexBytes > indexTail then : indexTail - incomingIndexBytes wraps to huge value.
141 if (incomingIndexBytes > indexTail) return true;
142 uint32_t alignedVertexHead = AlignUp(vertexHead, 16);
143 uint32_t alignedIndexTail = AlignDown(indexTail - incomingIndexBytes, 4);
144 return (alignedVertexHead + incomingVertexBytes + SAFETY_GAP > alignedIndexTail);
145 }
146
147 static uint32_t AlignUp(uint32_t value, uint32_t alignment) {
148 return (value + alignment - 1) & ~(alignment - 1);
149 }
150
151 static uint32_t AlignDown(uint32_t value, uint32_t alignment) {
152 return value & ~(alignment - 1);
153 }
154};
155
156struct BigGeometryObject {
157 Microsoft::WRL::ComPtr<ID3D12Resource> buffer;
158 Microsoft::WRL::ComPtr<ID3D12Resource> indirectBuffer;
159 uint32_t indexCount = 0;
160 uint32_t matrixIndex = 0;
161 uint64_t retireFence = 0;
162 std::atomic<bool> published = false;
163};
164
165struct GeometryPageSnapshot {// A lightweight, immutable snapshot of the current pages.
166 // We use raw pointers here because the Render thread only needs to observe them.
167 // Iterating over a contiguous array of pointers is extremely cache-friendly.
168 std::vector<GeometryPage*> pages;
169};
170
171struct TabGeometryStorage {
172 // THE RCU POINTER: Render threads read this, Copy thread writes to it.
173 std::atomic<GeometryPageSnapshot*> activeSnapshot{ nullptr };
174 // WRITER-ONLY STATE: Only the Copy thread touches these, so they need no locks/atomics.
175 std::vector<std::unique_ptr<GeometryPage>> activePages; // Actually owns the memory
176
177 // Cleanup queues for the Copy thread
178 struct RetiredSnapshot { GeometryPageSnapshot* snapshot; uint64_t retireFence; };
179 struct RetiredPage { std::unique_ptr<GeometryPage> page; uint64_t retireFence; };
180 std::vector<RetiredSnapshot> retiredSnapshots;
181 std::vector<RetiredPage> retiredPages;
182
183 /* TODO: RCU version of all of the following vectors need to be developed. Only 1st done so far.
184 std::vector<std::unique_ptr<GeometryPage>> opaquePages; // Opaque geometry pages
185 std::vector<std::unique_ptr<GeometryPage>> transparentPages; // Transparent geometry pages
186 std::vector<std::unique_ptr<GeometryPage>> wireframePages; // Wireframe pages (if used)
187 std::vector<std::unique_ptr<BigGeometryObject>> bigObjects; // Dedicated large objects
188 std::atomic<uint32_t> currentVersion = 0;
189 std::vector<std::unique_ptr<GeometryPage>> retiredPages;
190 */
191};
192
193/* DirectX 12 resources are organized at 3 levels:
1941. The Data : Per Tab (Jumbo Buffers for geometry data, materials, textures,
195 Pipeline State Object, Root Signature, Command Signature etc.)
1962. The Target : Per Window (Swap Chain, Render Targets, Depth Stencil Buffer etc.)
1973. The Worker : Per Render Thread. 1 For each monitor. (Command Queue, Command List etc.
198 Resources shared across multiple windows on the same monitor) */
199
200struct DX12ResourcesPerTab { // (The Data) Geometry Data
201
202 // Upload Heaps (CPU -> GPU Transfer)
203 // Moved here because the Copy Thread writes to these when adding objects to the TAB.
204 ComPtr<ID3D12Resource> vertexBufferUpload;
205 ComPtr<ID3D12Resource> indexBufferUpload;
206
207 // Persistent Mapped Pointers (CPU Address)
208 UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
209 UINT8* pIndexDataBegin = nullptr; // Pointer for mapped index upload buffer
210
211 // TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
212 ComPtr<ID3D12DescriptorHeap> srvHeap;
213
214 mutable std::mutex objectsOnGPUMutex;// Make mutex mutable so const references can lock it in rendering paths.
215 // Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
216 std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
217
218 //Copy thread owns/writes following variables exclusively. Render threads only read it. Without Lock.
219 ComPtr<ID3D12Resource> worldMatrixBuffer; // TODO: Doublebuffer it per frame.
220 UINT8 * pWorldMatrixDataBegin = nullptr;
221 uint32_t matrixCapacity = 4096;
222 uint32_t matrixCount = 0;
223 std::vector<uint32_t> freeMatrixSlots; // free-list for matrix indices.
224 //To enable re-use of slots when objects are removed.
225
226 // Initially rootSignature & pipelineState were in PerWindow, but now moved here,
227 // when adding commandSignature and indirect drawing infrastructure.
228 // Since Root Signature and Pipeline State are closely tied to the command signature,
229 ComPtr<ID3D12RootSignature> rootSignature;
230 ComPtr<ID3D12PipelineState> pipelineState;
231
232 ComPtr<ID3D12CommandSignature> commandSignature;// Indirect Drawing
233
234 CameraState camera; //Reference is updated per frame.
235 //Currently per tab, but latter we will have this per view. Since each tab can have multiple views.
236};
237
238struct DX12ResourcesPerWindow {// Presentation Logic
239 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
240 int WindowHeight = 600;
241 ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with.
242 //To assist with migrations.
243
244 ComPtr<IDXGISwapChain3> swapChain; // The link to the OS Window
245 //ComPtr<ID3D12CommandQueue> commandQueue; // Moved to OneMonitorController
246 ComPtr<ID3D12DescriptorHeap> rtvHeap;
247 ComPtr<ID3D12Resource> renderTargets[FRAMES_PER_RENDERTARGETS];
248
249 // Render To Texture Infrastructure
250 ComPtr<ID3D12Resource> renderTextures[FRAMES_PER_RENDERTARGETS];
251 ComPtr<ID3D12DescriptorHeap> rttRtvHeap;
252 ComPtr<ID3D12DescriptorHeap> rttSrvHeap;
253
254 // TODO: When we will implement HDR support, we wil have change above format to following.
255 //DXGI_FORMAT rttFormat = DXGI_FORMAT_R16G16B16A16_FLOAT; // HDR ready
256
257 ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
258 ComPtr<ID3D12DescriptorHeap> dsvHeap;
259
260 D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size).
261 D3D12_RECT scissorRect;
262
263 ComPtr<ID3D12Resource> constantBuffer;
264 ComPtr<ID3D12DescriptorHeap> cbvHeap;
265 UINT8* cbvDataBegin = nullptr;
266
267 UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread.
268 // It can change even during windows resize.
269};
270
271struct DX12ResourcesPerRenderThread { // This one is created 1 for each monitor.
272 // For convenience only. It simply points to OneMonitorController.commandQueue
273 ComPtr<ID3D12CommandQueue> commandQueue;
274
275 // Note that there are as many render thread as number of monitors attached.
276 // Command Allocators MUST be unique to the thread.
277 // We need one per frame-in-flight to avoid resetting while GPU is reading.
278 ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
279 UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
280
281 // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
282 ComPtr<ID3D12GraphicsCommandList> commandList;
283
284 // Synchronization (Per Window VSync)
285 HANDLE fenceEvent = nullptr;
286 ComPtr<ID3D12Fence> fence; // TODO: Discard this. use the fence inside monitor.
287};
288
289struct OneMonitorController { // Variables stored per monitor.
290 // System Fetched information.
291 bool isScreenInitalized = false;
292 int screenPixelWidth = 800;
293 int screenPixelHeight = 600;
294 int screenPhysicalWidth = 0; // in mm
295 int screenPhysicalHeight = 0; // in mm
296 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
297 int WindowHeight = 600;
298
299 HMONITOR hMonitor = NULL; // Monitor handle. Remains fixed as long as monitor is not disconnected / disabled.
300 std::wstring monitorName; // Monitor device name (e.g., "\\\\.\\DISPLAY1")
301 std::wstring friendlyName; // Human readable name (e.g., "Dell U2720Q")
302 RECT monitorRect; // Full monitor rectangle
303 RECT workAreaRect; // Work area (excluding task bar)
304 int dpiX = 96; // DPI X
305 int dpiY = 96; // DPI Y
306 double scaleFactor = 1.0; // Scale factor (100% = 1.0, 125% = 1.25, etc.)
307 bool isPrimary = false; // Is this the primary monitor?
308 DWORD orientation = DMDO_DEFAULT; // Monitor orientation
309 int refreshRate = 60; // Refresh rate in Hz
310 int colorDepth = 32; // Color depth in bits per pixel
311
312 bool isVirtualMonitor = false; // To support headless mode.
313
314 // DirectX12 Resources.
315 // TODO: Move these to per render thread structure.
316 ComPtr<ID3D12CommandQueue> commandQueue; // Persistent. Survives thread restarts.
317 bool hasActiveThread = false;// We need to know if this specific monitor is currently being serviced by a thread
318 ComPtr<ID3D12Fence> renderFence; // Signalled each frame by GpuRenderThread
319 uint64_t renderFenceValue = 0; // Last value signalled (written by render thread)
320 // Above is intentionally NOT std::atomic since gpu.renderFenceValue is the std::atomic serving all monitors.
321 HANDLE renderFenceEvent = nullptr;
322};
323
324// Commands sent from Generator thread(s) to the Copy thread
325enum class CommandToCopyThreadType { NONE = 0, ADD, MODIFY, REMOVE };
326struct CommandToCopyThread
327{
328 CommandToCopyThreadType type;
329 std::optional<GeometryData> geometry; // Present for ADD and MODIFY
330 uint64_t id = 0; // Always present
331 uint64_t tabID = 0; // NEW: We must know which tab this object belongs to!
332};
333
334extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
335
336// Packet of work for a Render Thread for one frame
337struct RenderPacket {
338 uint64_t frameNumber;
339 std::vector<uint64_t> visibleObjectIds;
340};
341
342class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
343{
344public:
345 HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
346 HRESULT Error() const { return hr; }
347private:
348 const HRESULT hr;
349};
350
351inline void ThrowIfFailed(HRESULT hr) {
352 if (FAILED(hr)) { throw HrException(hr); }
353}
354
355
356class ThreadSafeQueueGPU {
357public:
358 void push(CommandToCopyThread value) {
359 std::lock_guard<std::mutex> lock(mutex);
360 fifoQueue.push(std::move(value));
361 cond.notify_one();
362 }
363
364 // Non-blocking pop
365 bool try_pop(CommandToCopyThread& value) {
366 std::lock_guard<std::mutex> lock(mutex);
367 if (fifoQueue.empty()) { return false; }
368 value = std::move(fifoQueue.front());
369 fifoQueue.pop();
370 return true;
371 }
372
373 // Shuts down the queue, waking up any waiting threads
374 void shutdownQueue() {
375 std::lock_guard<std::mutex> lock(mutex);
376 shutdown = true;
377 cond.notify_all();
378 }
379
380private:
381 std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
382 std::mutex mutex;
383 std::condition_variable cond;
384 bool shutdown = false;
385};
386
387inline ThreadSafeQueueGPU g_gpuCommandQueue;
388
389// VRAM Manager : This class handles the GPU memory dynamically.
390// There will be exactly 1 object of this class in entire application. Hence the special name.
391// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
392class शंकर {
393public:
394 OneMonitorController screens[MV_MAX_MONITORS];
395 int currentMonitorCount = 0; // Global monitor count. It can be 0 when no monitors are found (headless mode)
396
397 // IDXGIFactory6 / IDXGIAdapter4 Prerequisite : Windows 10 1803+ / Windows 11
398 ComPtr<IDXGIFactory6> factory6; //The OS-level display system manager. Can iterate over GPUs.
399 ComPtr<IDXGIAdapter4> hardwareAdapter;// Represents a physical GPU device.
400 //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
401
402 ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
403 bool isGPUEngineInitialized = false; //TODO: To be implemented.
404 DXGI_FORMAT rttFormat = DXGI_FORMAT_R8G8B8A8_UNORM;
405
406 DX12ResourcesUI uiResources;
407
408 //Following to be added latter.
409 //ID3D12DescriptorHeapMgr ← Global descriptor allocator
410 //Shader& PSO Cache ← Shared by all threads
411 //AdapterInfo ← For device selection / VRAM stats
412
413 /* We will have 1 Render Queue per monitor, which is local to Render Thread.
414 IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
415 Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
416 Still we need to have 4 separate queue to properly handle different refresh rate.
417
418 Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
419 waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand.
420 Window B cannot be processed because the Queue is blocked by Windows A's VSync wait.
421 By using 4 Queues, Queue A can sit blocked waiting for VSync,
422 while Queue B immediately push work work to the GPU for the faster monitor.*/
423
424 std::atomic<uint64_t> renderFenceValue = 0; // Global. This is in addition to per monitor render fence value.
425
426 ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
427 ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
428 std::atomic<uint64_t> copyFenceValue = 1; // thread safe.
429 //Start from 1 to avoid confusion with default fence value of 0.
430 HANDLE copyFenceEvent = nullptr;
431
432public:
433 // Maps our CPU ObjectID to its resource info in VRAM
434 std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
435
436 // Simulates a simple heap allocator with 16MB chunks
437 uint64_t m_nextFreeOffset = 0;
438 const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
439 uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
440
441 // When an object is updated, the old VRAM is put here to be freed later.
442 struct DeferredFree {
443 uint64_t frameNumber; // The frame it became obsolete
444 GpuResourceVertexIndexInfo resource;
445 };
446 std::list<DeferredFree> deferredFreeQueue;
447
448 // Allocate space in VRAM. Returns the handle. What is this used for?
449 // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
450
451 // Descriptor sizes for RTV and CBV/SRV/UAV. We need these to calculate offsets in descriptor heaps.
452 // These are initialized during device creation and remain constant. i.e. They are hardware properties of GPU.
453 // We store them here for easy access across threads.
454 UINT rtvDescriptorSize = 0, cbvSrvUavDescriptorSize = 0; //Initialized during device creation.
455
456 void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
457
458 //शंकर() {}; // Our Main function initializes DirectX12 global resources by calling InitD3DDeviceOnly().
459 void InitD3DDeviceOnly();
460 void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
461 void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
462 void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thread.
463 DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes, TabGeometryStorage& storage);
464 void WaitForPreviousFrame(const DX12ResourcesPerRenderThread& dx);
465 void ResizeD3DWindow(DX12ResourcesPerWindow& dx, UINT newWidth, UINT newHeight);
466
467 // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
468 void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
469 // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
470 void CleanupTabResources(DX12ResourcesPerTab& tabRes);
471 // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
472 // Thread resources are cleaned up by the Render Thread itself before exit.
473 void CleanupD3DGlobal();
474};
475
476void FetchAllMonitorDetails();
477BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
478
479/*
480IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
481retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
482COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
483
484Ex: IID_PPV_ARGS(&device) expands to following:
485IID iid = __uuidof(ID3D12Device);
486void** ppv = reinterpret_cast<void**>(&device);
487*/
488
489// Structure to hold transformation matrices
490struct ConstantBuffer {
491 DirectX::XMFLOAT4X4 viewProj; // 64 bytes
492};
493
494// Externs for communication
495extern std::atomic<bool> shutdownSignal;
496
497// Logic Thread "Fence"
498extern std::mutex g_logicFenceMutex;
499extern std::condition_variable g_logicFenceCV;
500extern uint64_t g_logicFrameCount;
501
502// Copy Thread "Fence"
503extern std::mutex g_copyFenceMutex;
504extern std::condition_variable g_copyFenceCV;
505extern uint64_t g_copyFrameCount;
506
507//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
508/*
509std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
510
511 if (nextFreeOffset + size > m_vram_capacity) {
512 std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
513 // Here, the Main Logic thread would be signaled to reduce LOD.
514 return std::nullopt;
515 }
516 GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
517 nextFreeOffset += size; // Simple bump allocator
518 return info;
519}*/
520
521// Utility Functions
522
523// Waits for the previous frame to complete rendering.
524inline void WaitForGpu(DX12ResourcesPerWindow dx)
525{ //Where are we using this function?
526 /*
527 dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
528 dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
529 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
530 dx.fenceValue++;*/
531}
532
533// Waits for a specific fence value to be reached
534inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
535{ // Where are we using this?
536 /*
537 if (dx.fence->GetCompletedValue() < fenceValue)
538 {
539 ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
540 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
541 }*/
542}
543
544// Thread Functions
545// Thread synchronization between Main Logic thread and Copy thread
546inline std::mutex toCopyThreadMutex;
547inline std::condition_variable toCopyThreadCV;
548inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
549
550// Thread Functions - Just Declaration!
551void GpuCopyThread();
552void GpuRenderThread(int monitorId, int refreshRate);