Matrices and Derivatives: Solving Some Problems
NOTE: This article has been translated from Farsi using llama3-70b-8192
and groq
Derivative of Correlation Matrix Estimation and Neural Network Weights
The primary task of taking derivatives is to find the extremum points; however, the method of using derivatives to reach the extremum can be either through iterative optimization methods or solving formal equations. To illustrate each of these methods, we will first estimate the correlation matrix for a Gaussian distribution and then compute the gradient of an MLPMulti-Layer Perceptron with respect to its weights.
Estimating Correlation Matrix
The multivariate Gaussian distribution is formulated as follows:
Now, if we have samples to from this distribution, which values of and maximize the likelihood of these samples?
Assuming independence among samples, the likelihood of all samples becomes:
Simplifying the equation, we get:
To simplify further, we take derivatives with respect to and .
For the first derivative:
The second derivative:
Simplifying, we get:
Using the relationship , we obtain:
Derivative of MLP Weights
The layers of an MLP can be represented as:
Assuming we have an MLP with multiple layers, when we want to train it, we compare the output of the th layer with the desired results and calculate the error. Let’s assume we use the MSEMean Squared Error criterion:
Using the machinery we built for derivatives, we calculate :
Breaking down each component:
Simplifying the expression:
Using the Hadamard product (element-wise multiplication), we get:
In conclusion, the derivative of the correlation matrix estimation and MLP weights are computable using the derivatives of the logarithmic likelihood function and the error criterion, respectively.