Matrices and Derivatives: Kronecker in the Realm of Matrices
NOTE: This article has been translated from Farsi using llama3-70b-8192
and groq
I stated in my previous post that when we consider derivatives with respect to matrices as derivatives with respect to vectorized matrices (using the same operator ), we no longer need to deal with tensors, covectors and contravectors. Meanwhile, we need to bring the Kronecker product from that world to this world. The Kronecker product in tensor calculations creates a tensor with dimensions equal to the sum of the input tensors’ dimensions. For example, the Kronecker product of two vectors results in a matrix.
If we have two matrices and , their Kronecker product is defined as:
For example:
This product has a beautiful property that is essential for matrix derivative calculations:
For example, let’s calculate the derivative of with respect to :
Chain Rule and Product Rule
In the world of scalars, there was a set of general rule for derivatives that worked as a starting point for most of our calculations. These rules also apply to the world of matrices. One of these rules is the scalar product and another the sum rule, which I mentioned in the previous post.
Let’s see why these rules hold:
And so on.
Similarly, we can calculate the derivative of composite functions:
And in general:
It seems that those who defined the matrix product rule have laid a solid foundation. Everything fits together nicely.
But what about the product rule for two functions?
If you’re tired of element-wise derivatives, you’re rightfully so; I’m tired too. So, let’s seek help from Kronecker. First, consider the function . The derivative of this function with respect to and , using Kronecker’s relation, is:
Now, the derivative of is:
Note that the identity matrices related to and should not be assumed to have the same dimensions. The identity matrix in the Kronecker product has dimensions equal to , and vice versa.
There’s a small issue left to discuss, which is about the effect of transposition on the operator. For example, when calculating the derivative of with respect to :
What is the value of ? Does it have a relation with ? Of course, it does. Let’s consider a 3x3 matrix:
The relation between these two matrices can be written as:
The matrices consisting of a 1 in each row, the permutation matrices, are used to reorder the rows of the vector (or matrix) that they multiply with. We call the matrix that transforms to the permutation matrix . But what about the derivative of the Kronecker product? Let’s forget about it for now. Just remember that the derivative of the Kronecker product results in something like:
This time, I wrote the dimensions of the identity matrices and explicitly, as it’s crucial not to get them mixed up.
Well, that’s it for today. We’ll discuss functions like inverse and determinant of matrices in the next post and solve some problems using these tools in the final post.