While it is common (and often unavoidable) in C, you should never return the result of malloc or new from a function as a raw pointer in C++. It will inevitably lead to memory leaks.
In C++, you return data structures that manage the memory for you, e.g. std::vector, std::unique_ptr, Eigen::Matrix4d, etc. but never raw owning pointers.
Even in a pure C setting, I would argue that a matrix multiplication function should not allocate. The standard way is to use an output parameter for the result, this is what the traditional BLAS xGEMM routines do, for example. The calling code is responsible for allocating the storage for the result of the multiplication. It can choose to allocate it on the stack if necessary, and it can reuse other variables to limit the peak memory usage and number of allocations.
On desktop CPUs, you don't want each matrix operation to allocate memory, because allocation is slow and undeterministic, and you get slow, uncached heap memory.
On microcontrollers, you want to avoid dynamic allocations because they often lead to memory fragmentation.
As mentioned earlier, the layout of the “matrix” you're using is not actually a 2D array, it's an array of 4 pointers (float *) that point to arrays of 4 floats that might reside elsewhere in memory. It's not a contiguous area of memory, requires an additional indirection to access specific elements, and uses unnecessarily many allocations to set up.
For a 4×4 matrix, allocate an array of 16 floats instead, and index it as array[col + 4*row] or array[row + 4*col], depending on whether you're used to the C or Fortran convention (many linear algebra libraries ─ even in C and C++ ─ use Fortran conventions, i.e. column-major order, whereas 2D arrays in C and C++ are laid out in row-major order).
Finally, don't declare your arguments as float MatrixA[4][4], the compiler won't check the dimensions, you can pass an arrays with 100 rows or just 1 row as a parameter and the compiler won't even warn you.
Use a reference to an array instead. While you're at it, make them const read-only references, because you're not changing MatrixA and MatrixB.
For example:
void matmul(const float (&matrixA)[4][4], const float (&matrixB)[4][4], float (&result)[4][4]) {
for (size_t r = 0; r < 4; ++r)
for (size_t c = 0; c < 4; ++c)
result[r][c] = 0;
for (size_t r = 0; r < 4; ++r)
for (size_t k = 0; k < 4; ++k)
for (size_t c = 0; c < 4; ++c)
result[r][c] += matrixA[r][k] * matrixB[k][c];
}
Example usage:
void matprint(const float (&matrix)[4][4]) {
for (size_t r = 0; r < 4; ++r)
for (size_t c = 0; c < 4; ++c)
Serial.print(matrix[r][c]), Serial.print(c == 3 ? "\n" : ",\t");
}
void matmul(const float (&matrixA)[4][4], const float (&matrixB)[4][4], float (&result)[4][4]) {
for (size_t r = 0; r < 4; ++r)
for (size_t c = 0; c < 4; ++c)
result[r][c] = 0;
for (size_t r = 0; r < 4; ++r)
for (size_t k = 0; k < 4; ++k)
for (size_t c = 0; c < 4; ++c)
result[r][c] += matrixA[r][k] * matrixB[k][c];
}
void setup() {
Serial.begin(115200);
while (!Serial);
float matrixA[][4] {
{2, 3, 5, 7},
{11, 13, 17, 19},
{23, 29, 31, 37},
{41, 43, 47, 53},
};
float matrixB[][4] {
{59, 61, 67, 71},
{73, 79, 83, 89},
{97, 101, 103, 107},
{109, 113, 127, 131},
};
float matrixAtimesB[4][4];
matmul(matrixA, matrixB, matrixAtimesB);
matprint(matrixA);
Serial.println("×");
matprint(matrixB);
Serial.println('=');
matprint(matrixAtimesB);
}
void loop() {}
Pieter