you will use the Gpp compiler to test and answer the following questions:(Please personally test and record the relevant data to write a report, do not use any data and answers from the Internet.)
- Assume data cache is fully associative with 60 lines, each line can hold 10 doubles, the replacement rule is least recently used first, matrix elements are doubles, and n=4000. What is the total number of read cache misses for each matrix in each of the following three matrix multiplication algorithms? Implement these algorithms and verify the correctness of your implementation by checking the maximum error in the matrix C. Compile your code using gcc without any optimization flag and run your executables at any computer you can access. Rank the execution times and explain why the execution times are different.
/* ijk version */
for (i=0; i<n; i++)
for (j=0; j<n; j++)
for (k=0; k<n; k++)
c[in+j]=c[in+j]+a[in+k]b[k*n+j];
/* ikj version */
for (i=0; i<n; i++)
for (k=0; k<n; k++)
for (j=0; j<n; j++)
c[in+j]=c[in+j]+a[in+k]b[k*n+j];
/* kji version */
for (k=0; k<n; k++)
for (j=0; j<n; j++)
for (i=0; i<n; i++)
c[in+j]=c[in+j]+a[in+k]b[k*n+j];
2.Compile the attached simple matrix multiplication code dodgem-simple.c using gcc without any optimization flag and report the execution time of the program on any computer you can access. . Calculate how much faster dgemm-optimized runs than dodgem-simple. As we explained in class, architectureknowledge is very important while studying/designing compilers and operating systems, can you read the program dodgem-optimized.c and briefly explain how architecture knowledge may help programmers to write computer programs that run faster?