diff --git a/README.md b/README.md index 20ee451..b3d8f3c 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,41 @@ Vulkan Grass Rendering ================================== +![](img/grass!.gif) + **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 5** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Gene Liu + * [LinkedIn](https://www.linkedin.com/in/gene-l-3108641a3/) +* Tested on: Windows 10, i7-9750H @ 2.60GHz, 16GB RAM, GTX 1650 Max-Q 4096MB (personal laptop) + * SM 7.5 + +# Project 5: Vulkan Grass Rendering + +This project implements a grass simulator and renderer using Vulkan. Grass dynamics and behavior are based off the [Responsive Real-Time Grass Rendering for General 3D Scenes](https://www.cg.tuwien.ac.at/research/publications/2017/JAHRMANN-2017-RRTG/JAHRMANN-2017-RRTG-draft.pdf) paper, which models grass appearance and movement. Grass appearance is based off Bezier curves, where the tesselation shaders within the Vulkan pipeline are used to form the grass blade shape in accordance to its control points. Grass movement is influenced by a combination of gravity, grass recovery, and wind forces, then corrected to avoid unrealistic behavior such as ground clipping. The specific formulas and theorems applied are found in the paper above. + +## Blade Culling + +To further improve performance, grass blades are culled in 3 ways, whose processes are once again described within the paper. The first is orientation culling, which removes any grass blades that are parallel to the camera from rendering within the graphics pipeline. This is as parallel blades will barely be visible to the camera anyways, and so removing them decreases the number of blades to be rendered while maintaining visual fidelity. This is demonstrated below: + +![](img/orient_cull.gif) + +Next, any grass blades not within the camera frustum can also be culled, as these blades will not be visible at the current viewpoint regardless. This serves to once again decrease the amount of computation needed to render the scene at low cost. This is demonstrated below in the bottom left and right corners, where we see grass blades vanish as most of their area leaves the view frustum. + +![](img/view_cull.gif) + +Finally, we also cull grass blades based on their distance to the camera. Blades between the camera and a user defined max distance(specified to be 40 units in this case) are put into a user defined number of buckets(20 in this case). At each subsequent farther bucket from the camera, we cull a larger fraction of the blades in that bucket as per the id of the thread computing on the blade. This allows for less grass density at farther distances, where it is not needed. This once again allows the rendering of fewer blades while minimally affecting the scene visually. + +![](img/dist_cull.gif) + +## Performance Analysis + +The performance of the renderer in terms of FPS was analyzed with regards to a varying number of grass blades, under different culling methods. The perspective used for the following data is the same as the one in the initial image at the beginning of this readme. + +![](img/fps_blades.jpg) + +The graph above shows the FPS at different grass blade counts under no culling, only orientation culling, only view frustum culling, only distance culling, and finally all culling methods. As expected, the FPS decreases as the number of grass blades increase regardless of the culling method, as the number of triangles to render increases and so the GPU needs to spend more time processing more entities within the compute and graphics pipeline. This decrease is at a sublinear rate, as the x axis increases exponentially. -### (TODO: Your README) +Next, we see that in order of general performance we have all culling, then distance, then view frustum, then orientation, then no culling. There is a larger difference between the all culling and distance culling methods compared to the other 3. All culling and no culling perform the best and worst, as expected since the purpose of culling is to reduce computation and hence improve performance. Orientation culling likely has little impact due to the culling threshold set. This implementation culls blades if the dot product between the blade direction and the camera view vector is greater than 0.9, which requires both to be relatively aligned for the grass blade to be culled. This likely results in fewer grass blades culled from this method and so less of a performance gain in general. Next, view frustum culling also has small impacts to performance. This can be due to the perspective chosen above, which has minimal blades outside the camera frustum and so less computation saved. Distance culling had the most impact overall even at a camera viewpoint relatively close to the grass. The 20 buckets chosen likely came into play here to remove fractions of grass blades at farther buckets even at this distance, which saved larger amounts of computation and hence improved performance. -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +Finally, we see that the performance of all 5 culling variants begin to converge as we increase the grass blade count. This could be due to the ratio of culled blades to the total number of blades decreasing as the viewpoint remains the same, resulting in all methods needing to compute for more similar amounts of time. diff --git a/img/dist_cull.gif b/img/dist_cull.gif new file mode 100644 index 0000000..b6f24dc Binary files /dev/null and b/img/dist_cull.gif differ diff --git a/img/fps_blades.jpg b/img/fps_blades.jpg new file mode 100644 index 0000000..cd5b4dd Binary files /dev/null and b/img/fps_blades.jpg differ diff --git a/img/grass!.gif b/img/grass!.gif new file mode 100644 index 0000000..ba29468 Binary files /dev/null and b/img/grass!.gif differ diff --git a/img/grassiguess.jpg b/img/grassiguess.jpg new file mode 100644 index 0000000..8e4a55f Binary files /dev/null and b/img/grassiguess.jpg differ diff --git a/img/orient_cull.gif b/img/orient_cull.gif new file mode 100644 index 0000000..3a8b830 Binary files /dev/null and b/img/orient_cull.gif differ diff --git a/img/view_cull.gif b/img/view_cull.gif new file mode 100644 index 0000000..a1e8f89 Binary files /dev/null and b/img/view_cull.gif differ diff --git a/src/Blades.cpp b/src/Blades.cpp index 80e3d76..0142372 100644 --- a/src/Blades.cpp +++ b/src/Blades.cpp @@ -45,7 +45,7 @@ Blades::Blades(Device* device, VkCommandPool commandPool, float planeDim) : Mode indirectDraw.firstInstance = 0; BufferUtils::CreateBufferFromData(device, commandPool, blades.data(), NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, bladesBuffer, bladesBufferMemory); - BufferUtils::CreateBuffer(device, NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, culledBladesBuffer, culledBladesBufferMemory); + BufferUtils::CreateBuffer(device, NUM_BLADES * sizeof(Blade), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT, culledBladesBuffer, culledBladesBufferMemory); BufferUtils::CreateBufferFromData(device, commandPool, &indirectDraw, sizeof(BladeDrawIndirect), VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT, numBladesBuffer, numBladesBufferMemory); } diff --git a/src/Renderer.cpp b/src/Renderer.cpp index b445d04..a76a282 100644 --- a/src/Renderer.cpp +++ b/src/Renderer.cpp @@ -195,9 +195,38 @@ void Renderer::CreateTimeDescriptorSetLayout() { } void Renderer::CreateComputeDescriptorSetLayout() { - // TODO: Create the descriptor set layout for the compute pipeline - // Remember this is like a class definition stating why types of information - // will be stored at each binding + // Create the descriptor set layout for the compute pipeline + VkDescriptorSetLayoutBinding bladesLayoutBinding = {}; + bladesLayoutBinding.binding = 0; + bladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + bladesLayoutBinding.descriptorCount = 1; + bladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT; + bladesLayoutBinding.pImmutableSamplers = nullptr; + + VkDescriptorSetLayoutBinding culledBladesLayoutBinding = {}; + culledBladesLayoutBinding.binding = 1; + culledBladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + culledBladesLayoutBinding.descriptorCount = 1; + culledBladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT; + culledBladesLayoutBinding.pImmutableSamplers = nullptr; + + VkDescriptorSetLayoutBinding numBladesLayoutBinding = {}; + numBladesLayoutBinding.binding = 2; + numBladesLayoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + numBladesLayoutBinding.descriptorCount = 1; + numBladesLayoutBinding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT; + numBladesLayoutBinding.pImmutableSamplers = nullptr; + + std::vector bindings = { bladesLayoutBinding, culledBladesLayoutBinding, numBladesLayoutBinding }; + + VkDescriptorSetLayoutCreateInfo layoutInfo = {}; + layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO; + layoutInfo.bindingCount = static_cast(bindings.size()); + layoutInfo.pBindings = bindings.data(); + + if (vkCreateDescriptorSetLayout(logicalDevice, &layoutInfo, nullptr, &computeDescriptorSetLayout) != VK_SUCCESS) { + throw std::runtime_error("Failed to create descriptor set layout"); + } } void Renderer::CreateDescriptorPool() { @@ -215,7 +244,8 @@ void Renderer::CreateDescriptorPool() { // Time (compute) { VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER , 1 }, - // TODO: Add any additional types and counts of descriptors you will need to allocate + // Blade, culled blade, num blades buffers for compute + { VK_DESCRIPTOR_TYPE_STORAGE_BUFFER, static_cast(3 * scene->GetBlades().size()) } }; VkDescriptorPoolCreateInfo poolInfo = {}; @@ -318,8 +348,44 @@ void Renderer::CreateModelDescriptorSets() { } void Renderer::CreateGrassDescriptorSets() { - // TODO: Create Descriptor sets for the grass. + // Create Descriptor sets for the grass. // This should involve creating descriptor sets which point to the model matrix of each group of grass blades + grassDescriptorSets.resize(scene->GetBlades().size()); + + // Describe the desciptor set + VkDescriptorSetLayout layouts[] = { modelDescriptorSetLayout }; + VkDescriptorSetAllocateInfo allocInfo = {}; + allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO; + allocInfo.descriptorPool = descriptorPool; + allocInfo.descriptorSetCount = static_cast(grassDescriptorSets.size()); + allocInfo.pSetLayouts = layouts; + + // Allocate descriptor sets + if (vkAllocateDescriptorSets(logicalDevice, &allocInfo, grassDescriptorSets.data()) != VK_SUCCESS) { + throw std::runtime_error("Failed to allocate descriptor set"); + } + + std::vector descriptorWrites(grassDescriptorSets.size()); + + for (uint32_t i = 0; i < scene->GetBlades().size(); ++i) { + VkDescriptorBufferInfo modelBufferInfo = {}; + modelBufferInfo.buffer = scene->GetBlades()[i]->GetModelBuffer(); + modelBufferInfo.offset = 0; + modelBufferInfo.range = sizeof(ModelBufferObject); + + descriptorWrites[i].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; + descriptorWrites[i].dstSet = grassDescriptorSets[i]; + descriptorWrites[i].dstBinding = 0; + descriptorWrites[i].dstArrayElement = 0; + descriptorWrites[i].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER; + descriptorWrites[i].descriptorCount = 1; + descriptorWrites[i].pBufferInfo = &modelBufferInfo; + descriptorWrites[i].pImageInfo = nullptr; + descriptorWrites[i].pTexelBufferView = nullptr; + } + + // Update descriptor sets + vkUpdateDescriptorSets(logicalDevice, static_cast(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr); } void Renderer::CreateTimeDescriptorSet() { @@ -358,8 +424,74 @@ void Renderer::CreateTimeDescriptorSet() { } void Renderer::CreateComputeDescriptorSets() { - // TODO: Create Descriptor sets for the compute pipeline + // Create Descriptor sets for the compute pipeline // The descriptors should point to Storage buffers which will hold the grass blades, the culled grass blades, and the output number of grass blades + computeDescriptorSets.resize(scene->GetBlades().size()); + + // Describe the desciptor set + VkDescriptorSetLayout layouts[] = { computeDescriptorSetLayout }; + VkDescriptorSetAllocateInfo allocInfo = {}; + allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO; + allocInfo.descriptorPool = descriptorPool; + allocInfo.descriptorSetCount = static_cast(computeDescriptorSets.size()); + allocInfo.pSetLayouts = layouts; + + // Allocate descriptor sets + if (vkAllocateDescriptorSets(logicalDevice, &allocInfo, computeDescriptorSets.data()) != VK_SUCCESS) { + throw std::runtime_error("Failed to allocate descriptor set"); + } + + std::vector descriptorWrites(3 * computeDescriptorSets.size()); + + for (uint32_t i = 0; i < scene->GetBlades().size(); ++i) { + VkDescriptorBufferInfo bladesBufferInfo = {}; + bladesBufferInfo.buffer = scene->GetBlades()[i]->GetBladesBuffer(); + bladesBufferInfo.offset = 0; + bladesBufferInfo.range = NUM_BLADES * sizeof(Blade); + + VkDescriptorBufferInfo culledBladesBufferInfo = {}; + culledBladesBufferInfo.buffer = scene->GetBlades()[i]->GetCulledBladesBuffer(); + culledBladesBufferInfo.offset = 0; + culledBladesBufferInfo.range = NUM_BLADES * sizeof(Blade); + + VkDescriptorBufferInfo numBladesBufferInfo = {}; + numBladesBufferInfo.buffer = scene->GetBlades()[i]->GetNumBladesBuffer(); + numBladesBufferInfo.offset = 0; + numBladesBufferInfo.range = sizeof(BladeDrawIndirect); + + descriptorWrites[3*i].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; + descriptorWrites[3*i].dstSet = computeDescriptorSets[i]; + descriptorWrites[3*i].dstBinding = 0; + descriptorWrites[3*i].dstArrayElement = 0; + descriptorWrites[3*i].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + descriptorWrites[3*i].descriptorCount = 1; + descriptorWrites[3*i].pBufferInfo = &bladesBufferInfo; + descriptorWrites[3*i].pImageInfo = nullptr; + descriptorWrites[3*i].pTexelBufferView = nullptr; + + descriptorWrites[3 * i + 1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; + descriptorWrites[3 * i + 1].dstSet = computeDescriptorSets[i]; + descriptorWrites[3 * i + 1].dstBinding = 1; + descriptorWrites[3 * i + 1].dstArrayElement = 0; + descriptorWrites[3 * i + 1].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + descriptorWrites[3 * i + 1].descriptorCount = 1; + descriptorWrites[3 * i + 1].pBufferInfo = &culledBladesBufferInfo; + descriptorWrites[3 * i + 1].pImageInfo = nullptr; + descriptorWrites[3 * i + 1].pTexelBufferView = nullptr; + + descriptorWrites[3 * i + 2].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; + descriptorWrites[3 * i + 2].dstSet = computeDescriptorSets[i]; + descriptorWrites[3 * i + 2].dstBinding = 2; + descriptorWrites[3 * i + 2].dstArrayElement = 0; + descriptorWrites[3 * i + 2].descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER; + descriptorWrites[3 * i + 2].descriptorCount = 1; + descriptorWrites[3 * i + 2].pBufferInfo = &numBladesBufferInfo; + descriptorWrites[3 * i + 2].pImageInfo = nullptr; + descriptorWrites[3 * i + 2].pTexelBufferView = nullptr; + } + + // Update descriptor sets + vkUpdateDescriptorSets(logicalDevice, static_cast(descriptorWrites.size()), descriptorWrites.data(), 0, nullptr); } void Renderer::CreateGraphicsPipeline() { @@ -716,8 +848,7 @@ void Renderer::CreateComputePipeline() { computeShaderStageInfo.module = computeShaderModule; computeShaderStageInfo.pName = "main"; - // TODO: Add the compute dsecriptor set layout you create to this list - std::vector descriptorSetLayouts = { cameraDescriptorSetLayout, timeDescriptorSetLayout }; + std::vector descriptorSetLayouts = { cameraDescriptorSetLayout, timeDescriptorSetLayout, computeDescriptorSetLayout }; // Create pipeline layout VkPipelineLayoutCreateInfo pipelineLayoutInfo = {}; @@ -883,7 +1014,11 @@ void Renderer::RecordComputeCommandBuffer() { // Bind descriptor set for time uniforms vkCmdBindDescriptorSets(computeCommandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 1, 1, &timeDescriptorSet, 0, nullptr); - // TODO: For each group of blades bind its descriptor set and dispatch + // For each group of blades bind its descriptor set and dispatch + for (int i = 0; i < scene->GetBlades().size(); i++) { + vkCmdBindDescriptorSets(computeCommandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 2, 1, &computeDescriptorSets[i], 0, nullptr); + vkCmdDispatch(computeCommandBuffer, (NUM_BLADES / WORKGROUP_SIZE) + 1, 1, 1); + } // ~ End recording ~ if (vkEndCommandBuffer(computeCommandBuffer) != VK_SUCCESS) { @@ -975,14 +1110,13 @@ void Renderer::RecordCommandBuffers() { for (uint32_t j = 0; j < scene->GetBlades().size(); ++j) { VkBuffer vertexBuffers[] = { scene->GetBlades()[j]->GetCulledBladesBuffer() }; VkDeviceSize offsets[] = { 0 }; - // TODO: Uncomment this when the buffers are populated - // vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets); + vkCmdBindVertexBuffers(commandBuffers[i], 0, 1, vertexBuffers, offsets); - // TODO: Bind the descriptor set for each grass blades model + // Bind the descriptor set for each grass blades model + vkCmdBindDescriptorSets(commandBuffers[i], VK_PIPELINE_BIND_POINT_GRAPHICS, grassPipelineLayout, 1, 1, &grassDescriptorSets[j], 0, nullptr); // Draw - // TODO: Uncomment this when the buffers are populated - // vkCmdDrawIndirect(commandBuffers[i], scene->GetBlades()[j]->GetNumBladesBuffer(), 0, 1, sizeof(BladeDrawIndirect)); + vkCmdDrawIndirect(commandBuffers[i], scene->GetBlades()[j]->GetNumBladesBuffer(), 0, 1, sizeof(BladeDrawIndirect)); } // End render pass @@ -1041,8 +1175,6 @@ void Renderer::Frame() { Renderer::~Renderer() { vkDeviceWaitIdle(logicalDevice); - // TODO: destroy any resources you created - vkFreeCommandBuffers(logicalDevice, graphicsCommandPool, static_cast(commandBuffers.size()), commandBuffers.data()); vkFreeCommandBuffers(logicalDevice, computeCommandPool, 1, &computeCommandBuffer); @@ -1057,6 +1189,7 @@ Renderer::~Renderer() { vkDestroyDescriptorSetLayout(logicalDevice, cameraDescriptorSetLayout, nullptr); vkDestroyDescriptorSetLayout(logicalDevice, modelDescriptorSetLayout, nullptr); vkDestroyDescriptorSetLayout(logicalDevice, timeDescriptorSetLayout, nullptr); + vkDestroyDescriptorSetLayout(logicalDevice, computeDescriptorSetLayout, nullptr); vkDestroyDescriptorPool(logicalDevice, descriptorPool, nullptr); diff --git a/src/Renderer.h b/src/Renderer.h index 95e025f..36caa9b 100644 --- a/src/Renderer.h +++ b/src/Renderer.h @@ -56,12 +56,15 @@ class Renderer { VkDescriptorSetLayout cameraDescriptorSetLayout; VkDescriptorSetLayout modelDescriptorSetLayout; VkDescriptorSetLayout timeDescriptorSetLayout; + VkDescriptorSetLayout computeDescriptorSetLayout; VkDescriptorPool descriptorPool; VkDescriptorSet cameraDescriptorSet; std::vector modelDescriptorSets; VkDescriptorSet timeDescriptorSet; + std::vector grassDescriptorSets; + std::vector computeDescriptorSets; VkPipelineLayout graphicsPipelineLayout; VkPipelineLayout grassPipelineLayout; diff --git a/src/Scene.cpp b/src/Scene.cpp index 86894f2..3181087 100644 --- a/src/Scene.cpp +++ b/src/Scene.cpp @@ -1,3 +1,4 @@ +#include #include "Scene.h" #include "BufferUtils.h" @@ -32,6 +33,13 @@ void Scene::UpdateTime() { time.totalTime += time.deltaTime; memcpy(mappedData, &time, sizeof(Time)); + + //fps stdout + fps_sum -= fps_arr[fps_arr_idx]; + fps_arr[fps_arr_idx] = 1.f / time.deltaTime; + fps_sum += fps_arr[fps_arr_idx]; + fps_arr_idx = (fps_arr_idx + 1) % 100; + std::cout << (fps_sum / 100.f) << std::endl; } VkBuffer Scene::GetTimeBuffer() const { diff --git a/src/Scene.h b/src/Scene.h index 7699d78..c8f222c 100644 --- a/src/Scene.h +++ b/src/Scene.h @@ -20,6 +20,9 @@ class Scene { VkBuffer timeBuffer; VkDeviceMemory timeBufferMemory; Time time; + float fps_arr[100] = { 0 }; + int fps_arr_idx = 0; + float fps_sum = 0; void* mappedData; diff --git a/src/shaders/compute.comp b/src/shaders/compute.comp index 0fd0224..3576b15 100644 --- a/src/shaders/compute.comp +++ b/src/shaders/compute.comp @@ -2,6 +2,13 @@ #extension GL_ARB_separate_shader_objects : enable #define WORKGROUP_SIZE 32 +#define ORIENT_CULL 1 +#define VIEW_FRUST_CULL 1 +#define VF_T 0.01 +#define DIST_CULL 1 +#define DIST_MAX 40.f +#define DIST_BUCKETS 20 + layout(local_size_x = WORKGROUP_SIZE, local_size_y = 1, local_size_z = 1) in; layout(set = 0, binding = 0) uniform CameraBufferObject { @@ -21,20 +28,20 @@ struct Blade { vec4 up; }; -// TODO: Add bindings to: -// 1. Store the input blades -// 2. Write out the culled blades -// 3. Write the total number of blades remaining - -// The project is using vkCmdDrawIndirect to use a buffer as the arguments for a draw call -// This is sort of an advanced feature so we've showed you what this buffer should look like -// -// layout(set = ???, binding = ???) buffer NumBlades { -// uint vertexCount; // Write the number of blades remaining here -// uint instanceCount; // = 1 -// uint firstVertex; // = 0 -// uint firstInstance; // = 0 -// } numBlades; +layout(set = 2, binding = 0) buffer Blades { + Blade blades[]; +}; + +layout(set = 2, binding = 1) buffer CulledBlades { + Blade culledBlades[]; +}; + +layout(set = 2, binding = 2) buffer NumBlades { + uint vertexCount; // Write the number of blades remaining here + uint instanceCount; // = 1 + uint firstVertex; // = 0 + uint firstInstance; // = 0 +} numBlades; bool inBounds(float value, float bounds) { return (value >= -bounds) && (value <= bounds); @@ -43,14 +50,87 @@ bool inBounds(float value, float bounds) { void main() { // Reset the number of blades to 0 if (gl_GlobalInvocationID.x == 0) { - // numBlades.vertexCount = 0; + numBlades.vertexCount = 0; } barrier(); // Wait till all threads reach this point - // TODO: Apply forces on every blade and update the vertices in the buffer + // Apply forces on every blade and update the vertices in the buffer + + Blade currBlade = blades[gl_GlobalInvocationID.x]; + + vec3 v0 = currBlade.v0.xyz; + vec3 v1 = currBlade.v1.xyz; + vec3 v2 = currBlade.v2.xyz; + vec3 up = currBlade.up.xyz; + float orient = currBlade.v0.w; + float height = currBlade.v1.w; + float width = currBlade.v2.w; + float stiff = currBlade.up.w; + + //gravity + vec3 gE = vec3(0, -1.f, 0) * 2.f; + vec3 f = vec3(cos(orient), 0, sin(orient)); + vec3 gF = 0.25 * length(gE) * f; + vec3 g = gE + gF; + + //recovery + vec3 iv2 = v0 + height * up; + vec3 r = (iv2 - v2) * stiff; - // TODO: Cull blades that are too far away or not in the camera frustum and write them + //wind with random 2d wind fn + vec3 wind_dir = normalize(vec3(1, 0, 1)); + float wind_mag = cos(totalTime + 0.3 * v2.x) + sin(totalTime + 0.3 * v2.y); + vec3 wi = wind_dir * wind_mag; + float fd = 1 - abs(dot(normalize(wi), normalize(v2-v0))); + float fr = dot(v2-v0, up) / height; + float theta = fd * fr; + vec3 w = wi * theta; + + v2 = v2 + (g + r + w) * deltaTime; + + //state val + v2 = v2 - up * min(dot(up, v2-v0), 0); + float lproj = length(v2 - v0 - up*dot(v2-v0, up)); + v1 = v0 + height * up * max(1 - lproj/height, 0.05 * max(lproj/height, 1)); + float L = (2.f * length(v2-v0) + length(v2-v1) + length(v1-v0))/3.f; + float ratio = height/L; + v1 = v0 + ratio*(v1-v0); + v2 = v1 + ratio*(v2-v1); + currBlade.v1.xyz = v1.xyz; + currBlade.v2.xyz = v2.xyz; + blades[gl_GlobalInvocationID.x] = currBlade; + + + // Cull blades that are too far away or not in the camera frustum and write them // to the culled blades buffer - // Note: to do this, you will need to use an atomic operation to read and update numBlades.vertexCount - // You want to write the visible blades to the buffer without write conflicts between threads + #if ORIENT_CULL + vec3 dirc = inverse(camera.view)[2].xyz; + if (abs(dot(dirc, f)) > 0.9) { + return; + } + #endif + + #if VIEW_FRUST_CULL + vec3 mdpt = 0.25*v0 + 0.5*v1 + 0.25*v2; + mat4 VP = camera.proj * camera.view; + vec4 pv0 = VP * vec4(v0, 1.f); + vec4 pv2 = VP * vec4(v2, 1.f); + vec4 pm = VP * vec4(mdpt, 1.f); + + if (!(inBounds(pv0.x, pv0.w+VF_T) && inBounds(pv0.y, pv0.w+VF_T) && inBounds(pv0.z, pv0.w+VF_T)) && + !(inBounds(pv2.x, pv2.w+VF_T) && inBounds(pv2.y, pv2.w+VF_T) && inBounds(pv2.z, pv2.w+VF_T)) && + !(inBounds(pm.x, pm.w+VF_T) && inBounds(pm.y, pm.w+VF_T) && inBounds(pm.z, pm.w+VF_T))) { + return; + } + #endif + + #if DIST_CULL + vec3 cam_pos = inverse(camera.view)[3].xyz; + float dproj = length(v0 - cam_pos - up * dot(v0-cam_pos, up)); + if (mod(gl_GlobalInvocationID.x, DIST_BUCKETS) > floor(DIST_BUCKETS * (1 - dproj/DIST_MAX))) { + return; + } + #endif + + culledBlades[atomicAdd(numBlades.vertexCount, 1)] = blades[gl_GlobalInvocationID.x]; } diff --git a/src/shaders/grass.frag b/src/shaders/grass.frag index c7df157..c8553e4 100644 --- a/src/shaders/grass.frag +++ b/src/shaders/grass.frag @@ -6,12 +6,17 @@ layout(set = 0, binding = 0) uniform CameraBufferObject { mat4 proj; } camera; -// TODO: Declare fragment shader inputs +// Declare fragment shader inputs +layout(location = 0) in vec2 in_uv; +layout(location = 1) in vec3 in_pos; +layout(location = 2) in vec3 in_norm; layout(location = 0) out vec4 outColor; void main() { - // TODO: Compute fragment color + // Compute fragment color - outColor = vec4(1.0); + vec4 tip_col = vec4(0, 0.8, 0, 1); + vec4 root_col = vec4(0, 0.1, 0.05, 1); + outColor = root_col + (in_uv.y) * (tip_col - root_col); } diff --git a/src/shaders/grass.tesc b/src/shaders/grass.tesc index f9ffd07..1138a63 100644 --- a/src/shaders/grass.tesc +++ b/src/shaders/grass.tesc @@ -8,19 +8,32 @@ layout(set = 0, binding = 0) uniform CameraBufferObject { mat4 proj; } camera; -// TODO: Declare tessellation control shader inputs and outputs +// Declare tessellation control shader inputs and outputs +layout(location = 0) in vec4 in_v0[]; +layout(location = 1) in vec4 in_v1[]; +layout(location = 2) in vec4 in_v2[]; +layout(location = 3) in vec4 in_up[]; + +layout(location = 0) out vec4 out_v0[]; +layout(location = 1) out vec4 out_v1[]; +layout(location = 2) out vec4 out_v2[]; +layout(location = 3) out vec4 out_up[]; void main() { // Don't move the origin location of the patch gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position; - // TODO: Write any shader outputs + // Write any shader outputs + out_v0[gl_InvocationID] = in_v0[gl_InvocationID]; + out_v1[gl_InvocationID] = in_v1[gl_InvocationID]; + out_v2[gl_InvocationID] = in_v2[gl_InvocationID]; + out_up[gl_InvocationID] = in_up[gl_InvocationID]; - // TODO: Set level of tesselation - // gl_TessLevelInner[0] = ??? - // gl_TessLevelInner[1] = ??? - // gl_TessLevelOuter[0] = ??? - // gl_TessLevelOuter[1] = ??? - // gl_TessLevelOuter[2] = ??? - // gl_TessLevelOuter[3] = ??? + // Set level of tesselation + gl_TessLevelInner[0] = 8; + gl_TessLevelInner[1] = 8; + gl_TessLevelOuter[0] = 8; + gl_TessLevelOuter[1] = 8; + gl_TessLevelOuter[2] = 8; + gl_TessLevelOuter[3] = 8; } diff --git a/src/shaders/grass.tese b/src/shaders/grass.tese index 751fff6..46ba5b7 100644 --- a/src/shaders/grass.tese +++ b/src/shaders/grass.tese @@ -8,11 +8,39 @@ layout(set = 0, binding = 0) uniform CameraBufferObject { mat4 proj; } camera; -// TODO: Declare tessellation evaluation shader inputs and outputs +// Declare tessellation evaluation shader inputs and outputs +layout(location = 0) in vec4 in_v0[]; +layout(location = 1) in vec4 in_v1[]; +layout(location = 2) in vec4 in_v2[]; +layout(location = 3) in vec4 in_up[]; + +layout(location = 0) out vec2 out_uv; +layout(location = 1) out vec3 out_pos; +layout(location = 2) out vec3 out_norm; void main() { float u = gl_TessCoord.x; float v = gl_TessCoord.y; - // TODO: Use u and v to parameterize along the grass blade and output positions for each vertex of the grass blade + // Use u and v to parameterize along the grass blade and output positions for each vertex of the grass blade + vec3 v0 = in_v0[0].xyz; + vec3 v1 = in_v1[0].xyz; + vec3 v2 = in_v2[0].xyz; + float orient = in_v0[0].w; + float w = in_v2[0].w; + + vec3 t1 = vec3(cos(orient), 0, sin(orient)); + + float t = u - u * v + 0.5 * v; + vec3 a = v0 + v * (v1 - v0); + vec3 b = v1 + v * (v2 - v1); + vec3 c = a + v * (b - a); + vec3 c0 = c - w * t1; + vec3 c1 = c + w * t1; + vec3 t0 = normalize(b - a); + out_norm = normalize(cross(t0, t1)); + out_pos = (1 - t) * c0 + t * c1; + out_uv = vec2(u, v); + + gl_Position = camera.proj * camera.view * vec4(out_pos, 1.0f); } diff --git a/src/shaders/grass.vert b/src/shaders/grass.vert index db9dfe9..ace7487 100644 --- a/src/shaders/grass.vert +++ b/src/shaders/grass.vert @@ -6,12 +6,29 @@ layout(set = 1, binding = 0) uniform ModelBufferObject { mat4 model; }; -// TODO: Declare vertex shader inputs and outputs +// Declare vertex shader inputs and outputs +layout(location = 0) in vec4 in_v0; +layout(location = 1) in vec4 in_v1; +layout(location = 2) in vec4 in_v2; +layout(location = 3) in vec4 in_up; + +layout(location = 0) out vec4 out_v0; +layout(location = 1) out vec4 out_v1; +layout(location = 2) out vec4 out_v2; +layout(location = 3) out vec4 out_up; -out gl_PerVertex { - vec4 gl_Position; -}; void main() { - // TODO: Write gl_Position and any other shader outputs + // Write gl_Position and any other shader outputs + vec4 tv0 = model * vec4(in_v0.xyz, 1.f); + vec4 tv1 = model * vec4(in_v1.xyz, 1.f); + vec4 tv2 = model * vec4(in_v2.xyz, 1.f); + vec4 tup = model * vec4(in_up.xyz, 0.f); + + out_v0 = vec4(tv0.xyz/tv0.w, in_v0.w); + out_v1 = vec4(tv1.xyz/tv1.w, in_v1.w); + out_v2 = vec4(tv2.xyz/tv2.w, in_v2.w); + out_up = vec4(tup.xyz, in_up.w); + + gl_Position = model * vec4(in_v0.xyz, 1.f); }