CUDA kernel sync fixes, test_layer enhancement #3131

Cydral · 2026-01-07T12:35:11Z

Summary

This PR addresses critical CUDA synchronization issues and enhances the test_layer utility function.

CUDA Kernel Fixes

Several CUDA kernels were using __syncthreads() for cross-block synchronization, which is incorrect since __syncthreads() only synchronizes threads within the same block, not across different blocks. When grid_stride_range_y distributes work across multiple blocks, these synchronization barriers fail silently.

Affected functions decomposed into separate kernels:

inverse_norms()
dot_prods()
multiply_conv()
layer_normalize()
rms_normalize()
compute_act_halt_probabilities()

The fix replaces intra-kernel __syncthreads() with sequential launch_kernel() calls, which provide implicit synchronization between kernel executions.

test_layer Enhancement

Modified test_layer to accept optional parameters for testing layers with specific tensor input constraints, enabling proper gradient verification for layers that require particular input dimensions.

Related Discussion

Follow-up to #3128

…des an optimized linear transformation for multi-dimensional inputs.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…tion-free tokenization

…'s singleton

…_computation_time layer

Cydral and others added 30 commits April 28, 2025 22:10

Implementation of linear_ layer for neural networks. This layer provi…

3e9b9f1

…des an optimized linear transformation for multi-dimensional inputs.

Minor change

93ead3d

Update dlib/dnn/layers.h

bf1b805

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'davisking:master' into master

49bfbc6

Add reshape_to and flatten layers to Dlib's DNN module

f234faa

Missing update to "visitors.h"

26a2960

format fixing for reshape_to

c9a1ee4

Update dlib/test/dnn.cpp

02e62d8

Merge branch 'davisking:master' into master

394dee8

Vocabulary size fixed for learning, and function added for transforma…

778bfc1

…tion-free tokenization

Added a new example for learning a “complex” Transformer model.

03aafc2

Added a new example for learning a “complex” Transformer model.

22c2561

Updated example for training a Transformer model.

01cd0b2

fix for gcc/ffmpeg compilation

6b63e55

Fix a warning message for Ubuntu compilation.

ad1f757

Update for Linux environment.

c91c45a

Fix batch building

6fcc0aa

Slight improvement in model definition.

5a1773e

linear_ layer implementation improvement

10d7c59

finalizing the example

d4bf94b

Fixing break condition in training method.

a4dac0b

Fixing declaration order of variables.

63454e3

bpe_tokenizer improvements.

87ed70a

Example updated.

061c673

bpe_tokenizer class refactoring.

f6c8526

Example updated.

2db56f5

bpe_tokenizer class updated.

d4eeb2d

Decoding part of the bpe_tokenizer updated.

dcb5963

Network definition update

b81b502

Merge branch 'davisking:master' into master

80a6e0e

Cydral added 28 commits December 19, 2025 18:13

New example

c4086bc

New example added

1fc065d

Update

7b2c4ef

Update

9c86229

Update

0c730f9

Fix bug in cuda code for act layer

f028608

Embeddings class improvement

e2c229d

Update

d74b2f9

Update

9396527

Update

114dab9

Update

306b1d4

Fix tril_padding_context multiple definition linker errors with Meyer…

da591d3

…'s singleton

Add lr_mult_visitor for visit_layers_range

b69d284

Removed used var in patch_embeddings_/backward

1a6494f

Updated slm_mixture_of_experts_ex.cpp example

1ca58b1

Updated slm_chatbot_ex.cpp example

d7a4ebe

Update

e61623f

Remove (old) french comment

e6bb9ed

Fix typo

b40ed81

New static signal handler using

0d15d7b

Add "atomic" header

86dcfcf

Remove HRM example to stabilize first the new version of the adaptive…

07a847b

…_computation_time layer

Merge branch 'davisking:master' into fixes

07df888

Add files via upload

e0b275e

Add files via upload

ec0c773

Add files via upload

807f1bc

Add files via upload

0b3a4e7

Add files via upload

e7aacef

Cydral closed this Jan 7, 2026

Cydral deleted the fixes branch January 7, 2026 12:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA kernel sync fixes, test_layer enhancement #3131

CUDA kernel sync fixes, test_layer enhancement #3131

Uh oh!

Cydral commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA kernel sync fixes, test_layer enhancement #3131

CUDA kernel sync fixes, test_layer enhancement #3131

Uh oh!

Conversation

Cydral commented Jan 7, 2026

Summary

CUDA Kernel Fixes

test_layer Enhancement

Related Discussion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants