Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 34 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,41 @@
Vulkan Grass Rendering
==================================

![](img/grass!.gif)

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Gene Liu
* [LinkedIn](https://www.linkedin.com/in/gene-l-3108641a3/)
* Tested on: Windows 10, i7-9750H @ 2.60GHz, 16GB RAM, GTX 1650 Max-Q 4096MB (personal laptop)
* SM 7.5

# Project 5: Vulkan Grass Rendering

This project implements a grass simulator and renderer using Vulkan. Grass dynamics and behavior are based off the [Responsive Real-Time Grass Rendering for General 3D Scenes](https://www.cg.tuwien.ac.at/research/publications/2017/JAHRMANN-2017-RRTG/JAHRMANN-2017-RRTG-draft.pdf) paper, which models grass appearance and movement. Grass appearance is based off Bezier curves, where the tesselation shaders within the Vulkan pipeline are used to form the grass blade shape in accordance to its control points. Grass movement is influenced by a combination of gravity, grass recovery, and wind forces, then corrected to avoid unrealistic behavior such as ground clipping. The specific formulas and theorems applied are found in the paper above.

## Blade Culling

To further improve performance, grass blades are culled in 3 ways, whose processes are once again described within the paper. The first is orientation culling, which removes any grass blades that are parallel to the camera from rendering within the graphics pipeline. This is as parallel blades will barely be visible to the camera anyways, and so removing them decreases the number of blades to be rendered while maintaining visual fidelity. This is demonstrated below:

![](img/orient_cull.gif)

Next, any grass blades not within the camera frustum can also be culled, as these blades will not be visible at the current viewpoint regardless. This serves to once again decrease the amount of computation needed to render the scene at low cost. This is demonstrated below in the bottom left and right corners, where we see grass blades vanish as most of their area leaves the view frustum.

![](img/view_cull.gif)

Finally, we also cull grass blades based on their distance to the camera. Blades between the camera and a user defined max distance(specified to be 40 units in this case) are put into a user defined number of buckets(20 in this case). At each subsequent farther bucket from the camera, we cull a larger fraction of the blades in that bucket as per the id of the thread computing on the blade. This allows for less grass density at farther distances, where it is not needed. This once again allows the rendering of fewer blades while minimally affecting the scene visually.

![](img/dist_cull.gif)

## Performance Analysis

The performance of the renderer in terms of FPS was analyzed with regards to a varying number of grass blades, under different culling methods. The perspective used for the following data is the same as the one in the initial image at the beginning of this readme.

![](img/fps_blades.jpg)

The graph above shows the FPS at different grass blade counts under no culling, only orientation culling, only view frustum culling, only distance culling, and finally all culling methods. As expected, the FPS decreases as the number of grass blades increase regardless of the culling method, as the number of triangles to render increases and so the GPU needs to spend more time processing more entities within the compute and graphics pipeline. This decrease is at a sublinear rate, as the x axis increases exponentially.

### (TODO: Your README)
Next, we see that in order of general performance we have all culling, then distance, then view frustum, then orientation, then no culling. There is a larger difference between the all culling and distance culling methods compared to the other 3. All culling and no culling perform the best and worst, as expected since the purpose of culling is to reduce computation and hence improve performance. Orientation culling likely has little impact due to the culling threshold set. This implementation culls blades if the dot product between the blade direction and the camera view vector is greater than 0.9, which requires both to be relatively aligned for the grass blade to be culled. This likely results in fewer grass blades culled from this method and so less of a performance gain in general. Next, view frustum culling also has small impacts to performance. This can be due to the perspective chosen above, which has minimal blades outside the camera frustum and so less computation saved. Distance culling had the most impact overall even at a camera viewpoint relatively close to the grass. The 20 buckets chosen likely came into play here to remove fractions of grass blades at farther buckets even at this distance, which saved larger amounts of computation and hence improved performance.

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
Finally, we see that the performance of all 5 culling variants begin to converge as we increase the grass blade count. This could be due to the ratio of culled blades to the total number of blades decreasing as the viewpoint remains the same, resulting in all methods needing to compute for more similar amounts of time.
Binary file added img/dist_cull.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/fps_blades.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/grass!.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/grassiguess.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/orient_cull.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/view_cull.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/Blades.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Blades::Blades(Device* device, VkCommandPool commandPool, float planeDim) : Mode
indirectDraw.firstInstance = 0;

BufferUtils::CreateBufferFromData(device, commandPool, blades.data(), NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, bladesBuffer, bladesBufferMemory);
BufferUtils::CreateBuffer(device, NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, culledBladesBuffer, culledBladesBufferMemory);
BufferUtils::CreateBuffer(device, NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, culledBladesBuffer, culledBladesBufferMemory);
BufferUtils::CreateBufferFromData(device, commandPool, &indirectDraw, sizeof(BladeDrawIndirect), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT, numBladesBuffer, numBladesBufferMemory);
}

Expand Down
165 changes: 149 additions & 16 deletions src/Renderer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,38 @@ void Renderer::CreateTimeDescriptorSetLayout() {
}

void Renderer::CreateComputeDescriptorSetLayout() {
// TODO: Create the descriptor set layout for the compute pipeline
// Remember this is like a class definition stating why types of information
// will be stored at each binding
// Create the descriptor set layout for the compute pipeline
VkDescriptorSetLayoutBinding bladesLayoutBinding = {};
bladesLayoutBinding.binding = 0;
bladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
bladesLayoutBinding.descriptorCount = 1;
bladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
bladesLayoutBinding.pImmutableSamplers = nullptr;

VkDescriptorSetLayoutBinding culledBladesLayoutBinding = {};
culledBladesLayoutBinding.binding = 1;
culledBladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
culledBladesLayoutBinding.descriptorCount = 1;
culledBladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
culledBladesLayoutBinding.pImmutableSamplers = nullptr;

VkDescriptorSetLayoutBinding numBladesLayoutBinding = {};
numBladesLayoutBinding.binding = 2;
numBladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
numBladesLayoutBinding.descriptorCount = 1;
numBladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
numBladesLayoutBinding.pImmutableSamplers = nullptr;

std::vector<VkDescriptorSetLayoutBinding> bindings = { bladesLayoutBinding, culledBladesLayoutBinding, numBladesLayoutBinding };

VkDescriptorSetLayoutCreateInfo layoutInfo = {};
layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
layoutInfo.bindingCount = static_cast<uint32_t>(bindings.size());
layoutInfo.pBindings = bindings.data();

if (vkCreateDescriptorSetLayout(logicalDevice, &layoutInfo, nullptr, &computeDescriptorSetLayout) != VK_SUCCESS) {
throw std::runtime_error("Failed to create descriptor set layout");
}
}

void Renderer::CreateDescriptorPool() {
Expand All @@ -215,7 +244,8 @@ void Renderer::CreateDescriptorPool() {
// Time (compute)
{ VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER , 1 },

// TODO: Add any additional types and counts of descriptors you will need to allocate
// Blade, culled blade, num blades buffers for compute
{ VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, static_cast<uint32_t>(3 * scene->GetBlades().size()) }
};

VkDescriptorPoolCreateInfo poolInfo = {};
Expand Down Expand Up @@ -318,8 +348,44 @@ void Renderer::CreateModelDescriptorSets() {
}

void Renderer::CreateGrassDescriptorSets() {
// TODO: Create Descriptor sets for the grass.
// Create Descriptor sets for the grass.
// This should involve creating descriptor sets which point to the model matrix of each group of grass blades
grassDescriptorSets.resize(scene->GetBlades().size());

// Describe the desciptor set
VkDescriptorSetLayout layouts[] = { modelDescriptorSetLayout };
VkDescriptorSetAllocateInfo allocInfo = {};
allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
allocInfo.descriptorPool = descriptorPool;
allocInfo.descriptorSetCount = static_cast<uint32_t>(grassDescriptorSets.size());
allocInfo.pSetLayouts = layouts;

// Allocate descriptor sets
if (vkAllocateDescriptorSets(logicalDevice, &allocInfo, grassDescriptorSets.data()) != VK_SUCCESS) {
throw std::runtime_error("Failed to allocate descriptor set");
}

std::vector<VkWriteDescriptorSet> descriptorWrites(grassDescriptorSets.size());

for (uint32_t i = 0; i < scene->GetBlades().size(); ++i) {
VkDescriptorBufferInfo modelBufferInfo = {};
modelBufferInfo.buffer = scene->GetBlades()[i]->GetModelBuffer();
modelBufferInfo.offset = 0;
modelBufferInfo.range = sizeof(ModelBufferObject);

descriptorWrites[i].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[i].dstSet = grassDescriptorSets[i];
descriptorWrites[i].dstBinding = 0;
descriptorWrites[i].dstArrayElement = 0;
descriptorWrites[i].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrites[i].descriptorCount = 1;
descriptorWrites[i].pBufferInfo = &modelBufferInfo;
descriptorWrites[i].pImageInfo = nullptr;
descriptorWrites[i].pTexelBufferView = nullptr;
}

// Update descriptor sets
vkUpdateDescriptorSets(logicalDevice, static_cast<uint32_t>(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr);
}

void Renderer::CreateTimeDescriptorSet() {
Expand Down Expand Up @@ -358,8 +424,74 @@ void Renderer::CreateTimeDescriptorSet() {
}

void Renderer::CreateComputeDescriptorSets() {
// TODO: Create Descriptor sets for the compute pipeline
// Create Descriptor sets for the compute pipeline
// The descriptors should point to Storage buffers which will hold the grass blades, the culled grass blades, and the output number of grass blades
computeDescriptorSets.resize(scene->GetBlades().size());

// Describe the desciptor set
VkDescriptorSetLayout layouts[] = { computeDescriptorSetLayout };
VkDescriptorSetAllocateInfo allocInfo = {};
allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
allocInfo.descriptorPool = descriptorPool;
allocInfo.descriptorSetCount = static_cast<uint32_t>(computeDescriptorSets.size());
allocInfo.pSetLayouts = layouts;

// Allocate descriptor sets
if (vkAllocateDescriptorSets(logicalDevice, &allocInfo, computeDescriptorSets.data()) != VK_SUCCESS) {
throw std::runtime_error("Failed to allocate descriptor set");
}

std::vector<VkWriteDescriptorSet> descriptorWrites(3 * computeDescriptorSets.size());

for (uint32_t i = 0; i < scene->GetBlades().size(); ++i) {
VkDescriptorBufferInfo bladesBufferInfo = {};
bladesBufferInfo.buffer = scene->GetBlades()[i]->GetBladesBuffer();
bladesBufferInfo.offset = 0;
bladesBufferInfo.range = NUM_BLADES * sizeof(Blade);

VkDescriptorBufferInfo culledBladesBufferInfo = {};
culledBladesBufferInfo.buffer = scene->GetBlades()[i]->GetCulledBladesBuffer();
culledBladesBufferInfo.offset = 0;
culledBladesBufferInfo.range = NUM_BLADES * sizeof(Blade);

VkDescriptorBufferInfo numBladesBufferInfo = {};
numBladesBufferInfo.buffer = scene->GetBlades()[i]->GetNumBladesBuffer();
numBladesBufferInfo.offset = 0;
numBladesBufferInfo.range = sizeof(BladeDrawIndirect);

descriptorWrites[3*i].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[3*i].dstSet = computeDescriptorSets[i];
descriptorWrites[3*i].dstBinding = 0;
descriptorWrites[3*i].dstArrayElement = 0;
descriptorWrites[3*i].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
descriptorWrites[3*i].descriptorCount = 1;
descriptorWrites[3*i].pBufferInfo = &bladesBufferInfo;
descriptorWrites[3*i].pImageInfo = nullptr;
descriptorWrites[3*i].pTexelBufferView = nullptr;

descriptorWrites[3 * i + 1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[3 * i + 1].dstSet = computeDescriptorSets[i];
descriptorWrites[3 * i + 1].dstBinding = 1;
descriptorWrites[3 * i + 1].dstArrayElement = 0;
descriptorWrites[3 * i + 1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
descriptorWrites[3 * i + 1].descriptorCount = 1;
descriptorWrites[3 * i + 1].pBufferInfo = &culledBladesBufferInfo;
descriptorWrites[3 * i + 1].pImageInfo = nullptr;
descriptorWrites[3 * i + 1].pTexelBufferView = nullptr;

descriptorWrites[3 * i + 2].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[3 * i + 2].dstSet = computeDescriptorSets[i];
descriptorWrites[3 * i + 2].dstBinding = 2;
descriptorWrites[3 * i + 2].dstArrayElement = 0;
descriptorWrites[3 * i + 2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
descriptorWrites[3 * i + 2].descriptorCount = 1;
descriptorWrites[3 * i + 2].pBufferInfo = &numBladesBufferInfo;
descriptorWrites[3 * i + 2].pImageInfo = nullptr;
descriptorWrites[3 * i + 2].pTexelBufferView = nullptr;
}

// Update descriptor sets
vkUpdateDescriptorSets(logicalDevice, static_cast<uint32_t>(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr);
}

void Renderer::CreateGraphicsPipeline() {
Expand Down Expand Up @@ -716,8 +848,7 @@ void Renderer::CreateComputePipeline() {
computeShaderStageInfo.module = computeShaderModule;
computeShaderStageInfo.pName = "main";

// TODO: Add the compute dsecriptor set layout you create to this list
std::vector<VkDescriptorSetLayout> descriptorSetLayouts = { cameraDescriptorSetLayout, timeDescriptorSetLayout };
std::vector<VkDescriptorSetLayout> descriptorSetLayouts = { cameraDescriptorSetLayout, timeDescriptorSetLayout, computeDescriptorSetLayout };

// Create pipeline layout
VkPipelineLayoutCreateInfo pipelineLayoutInfo = {};
Expand Down Expand Up @@ -883,7 +1014,11 @@ void Renderer::RecordComputeCommandBuffer() {
// Bind descriptor set for time uniforms
vkCmdBindDescriptorSets(computeCommandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 1, 1, &timeDescriptorSet, 0, nullptr);

// TODO: For each group of blades bind its descriptor set and dispatch
// For each group of blades bind its descriptor set and dispatch
for (int i = 0; i < scene->GetBlades().size(); i++) {
vkCmdBindDescriptorSets(computeCommandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 2, 1, &computeDescriptorSets[i], 0, nullptr);
vkCmdDispatch(computeCommandBuffer, (NUM_BLADES / WORKGROUP_SIZE) + 1, 1, 1);
}

// ~ End recording ~
if (vkEndCommandBuffer(computeCommandBuffer) != VK_SUCCESS) {
Expand Down Expand Up @@ -975,14 +1110,13 @@ void Renderer::RecordCommandBuffers() {
for (uint32_t j = 0; j < scene->GetBlades().size(); ++j) {
VkBuffer vertexBuffers[] = { scene->GetBlades()[j]->GetCulledBladesBuffer() };
VkDeviceSize offsets[] = { 0 };
// TODO: Uncomment this when the buffers are populated
// vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets);
vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets);

// TODO: Bind the descriptor set for each grass blades model
// Bind the descriptor set for each grass blades model
vkCmdBindDescriptorSets(commandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, grassPipelineLayout, 1, 1, &grassDescriptorSets[j], 0, nullptr);

// Draw
// TODO: Uncomment this when the buffers are populated
// vkCmdDrawIndirect(commandBuffers[i], scene->GetBlades()[j]->GetNumBladesBuffer(), 0, 1, sizeof(BladeDrawIndirect));
vkCmdDrawIndirect(commandBuffers[i], scene->GetBlades()[j]->GetNumBladesBuffer(), 0, 1, sizeof(BladeDrawIndirect));
}

// End render pass
Expand Down Expand Up @@ -1041,8 +1175,6 @@ void Renderer::Frame() {
Renderer::~Renderer() {
vkDeviceWaitIdle(logicalDevice);

// TODO: destroy any resources you created

vkFreeCommandBuffers(logicalDevice, graphicsCommandPool, static_cast<uint32_t>(commandBuffers.size()), commandBuffers.data());
vkFreeCommandBuffers(logicalDevice, computeCommandPool, 1, &computeCommandBuffer);

Expand All @@ -1057,6 +1189,7 @@ Renderer::~Renderer() {
vkDestroyDescriptorSetLayout(logicalDevice, cameraDescriptorSetLayout, nullptr);
vkDestroyDescriptorSetLayout(logicalDevice, modelDescriptorSetLayout, nullptr);
vkDestroyDescriptorSetLayout(logicalDevice, timeDescriptorSetLayout, nullptr);
vkDestroyDescriptorSetLayout(logicalDevice, computeDescriptorSetLayout, nullptr);

vkDestroyDescriptorPool(logicalDevice, descriptorPool, nullptr);

Expand Down
3 changes: 3 additions & 0 deletions src/Renderer.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,15 @@ class Renderer {
VkDescriptorSetLayout cameraDescriptorSetLayout;
VkDescriptorSetLayout modelDescriptorSetLayout;
VkDescriptorSetLayout timeDescriptorSetLayout;
VkDescriptorSetLayout computeDescriptorSetLayout;

VkDescriptorPool descriptorPool;

VkDescriptorSet cameraDescriptorSet;
std::vector<VkDescriptorSet> modelDescriptorSets;
VkDescriptorSet timeDescriptorSet;
std::vector<VkDescriptorSet> grassDescriptorSets;
std::vector<VkDescriptorSet> computeDescriptorSets;

VkPipelineLayout graphicsPipelineLayout;
VkPipelineLayout grassPipelineLayout;
Expand Down
8 changes: 8 additions & 0 deletions src/Scene.cpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#include <iostream>
#include "Scene.h"
#include "BufferUtils.h"

Expand Down Expand Up @@ -32,6 +33,13 @@ void Scene::UpdateTime() {
time.totalTime += time.deltaTime;

memcpy(mappedData, &time, sizeof(Time));

//fps stdout
fps_sum -= fps_arr[fps_arr_idx];
fps_arr[fps_arr_idx] = 1.f / time.deltaTime;
fps_sum += fps_arr[fps_arr_idx];
fps_arr_idx = (fps_arr_idx + 1) % 100;
std::cout << (fps_sum / 100.f) << std::endl;
}

VkBuffer Scene::GetTimeBuffer() const {
Expand Down
3 changes: 3 additions & 0 deletions src/Scene.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ class Scene {
VkBuffer timeBuffer;
VkDeviceMemory timeBufferMemory;
Time time;
float fps_arr[100] = { 0 };
int fps_arr_idx = 0;
float fps_sum = 0;

void* mappedData;

Expand Down
Loading