Create two 1Mx1K float matrices matA and matB, compute matA + matB.
- compute the result row by row and col by col, compare the performance difference
- use -O3 to improve the speed
- improve the speed using SIMD, will the speed be improved? Why?
Create two 1Mx1K float matrices matA and matB, compute matA + matB.