Matrices and Derivatives: Comradely functions

By Behrooz Vedadian on Jan, 28th 2019math

NOTE: This article has been translated from Farsi using llama3-70b-8192 and groq

Now that we have reviewed the initial problems on matrix derivatives, we can talk about derivatives of general functions defined in terms of matrices, such as the inverse matrix and determinant.

Let’s start with the derivative of the inverse matrix. When the inverse of a matrix is multiplied by itself, the result is an identity matrix with the same dimensions as the original matrix. We can start from here:

A1AA=IA=0

On the other hand, we also had a relationship for taking the derivative of the product of two matrix functions:

f(A)g(A)A=(If(A))g(A)A+(g(A)TI)f(A)A

Substituting into the first equation, we have:

A1AA=(IA1)+(ATI)A1A=0

Et voila!

A1A=(ATA1)=(ATA1)

To calculate this last part, I used two identities involving Kronecker products:

(AB)1=(A1B1)(AB)(CD)=(ACBD)

Of course, these only hold under certain conditions. That is, in the first equation, both matrices A and B must be invertible, otherwise, the Kronecker product will not be invertible either. In the second equation, matrices A and C can be multiplied, and also B and D, but this multiplication may not be possible in general, however, we can still multiply the output of the two Kronecker products as usual matrices.

The next function is the determinant. Calculating its derivative is a bit more involved than the previous one. To obtain the derivative of the determinant, we need to pay attention to the adjugate matrix. The relationship between the determinant and the adjugate matrix is quite interesting. It suffices to select any row of the adjugate matrix and multiply it by the corresponding column in the original matrix. The result is the determinant of the original matrix. This means that for any i (in the range of 1 to the number of columns of the original matrix):

|A|=Ai,.*A.,i

On the other hand, the inverse of a matrix can also be calculated using the adjugate matrix:

A1=1|A|A*

With these, we can finish the task:

|A|A.,i=Ai,.*|A|A=[A1,.*,A2,.*,An,.*]=vecT((A*)T)

Finally, we can replace A* with |A|A1 and we’re done:

|A|A=vecT((|A|A1)T)=|A|vecT(AT)

If we consider ln|A|, the result becomes even more beautiful:

ln|A|A=vecT(AT)

Can you tell why?

Finally, the last function we’ll discuss in this section is the inner product of two matrices. The inner product of two matrices is equal to the product of their vectorized forms:

<A,B>=vecT(A)vec(B)=trace(ATB)

which is often introduced as the inner product in most texts. The reason is easy to see.

trace(ATB)=i(ATB)i,i=iA.,iTB.,ivec(A)=[A.,1A.,2A.,m], vec(B)=[B.,1B.,2B.,m]vecT(A)vec(B)=iA.,iTB.,i

If you’ve been careful, you’ll notice that the number of rows of matrix A and the number of columns of B are equal. This is why this inner product is defined for such matrices. Typically, trace is defined on square matrices, and the output of ATB must be a square matrix so that we can apply the trace to it.

The derivative of the inner product of two matrices with respect to one of them is, as expected, the transposed vectorized form of the other:

trace(ATB)B=vecT(A)trace(ATB)A=vecT(B)

This is the same as what we saw in “Linear Algebra II” course in university regarding vectors:

aTbb=aTaTba=bT

After three posts on derivatives with respect to matrices, we have reached a point where we can solve problems like estimating the covariance matrix of a Gaussian distribution from its samples. We’ll solve two examples of these problems in the next post, which will conclude our discussion on matrix derivatives.