Intuition about project input embedding tokens to queries, keys and values #919
krishnan-duraisamy
started this conversation in
General
Replies: 1 comment
-
|
Hi there,
This would be in the
Actually, there is no splitting here, they are all separate matrices. Maybe it's easier to see this in figure 3.18. With splitting, you perhaps mean the separate heads in figure 3.24? This is to have separate channels that can learn independently, similar to convolutional neural network channels. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In Section 3.4.1 we have this definition -
These three matrices are used to project the embedded input tokens, x(i), into query, key, and value vectors, respectively, as illustrated in figure 3.14These weight matrices are then initialized later to random tensors like so:
Beta Was this translation helpful? Give feedback.
All reactions