The transpose method creates the transpose of B in a buffer. This method gives the fastest result (matrix multiplication goes as O (n^3) and transpose as O (n^2) so doing the transpose is at least x faster). The wiki method without blocking is also fast and does not need a buffer. The blocking method . Consider two square matrices A and B of size n that have to be multiplied: 1. Partition these matrices in square blocks p, where p is the number of processes available. 2. Create a matrix of processes of size p1/2 1/2 x p so that each process can maintain a block of A matrix and a block of B matrix. 3. Blocked Matrix Multiplication using OpenMP Blocked matrix multiplication is a technique in which you separate a matrix into different 'blocks' in which you calculate each block one at a time. This can be useful for larger matrices where spacial caching may come into play.

Excuse, that I interrupt you, but, in my opinion, this theme is not so actual.

I apologise, but, in my opinion, you are not right. I am assured. I can defend the position. Write to me in PM, we will talk.

The theme is interesting, I will take part in discussion. Together we can come to a right answer.