Saturday, September 21, 2019

c - Optimizing code using Intel SSE intrinsics for vectorization

This is my very first time working with SSE intrinsics. I am trying to convert a simple piece of code into a faster version using Intel SSE intrinsic (up to SSE4.2). I seem to encounter a number of errors.


The scalar version of the code is: (simple matrix multiplication)


     void mm(int n, double *A, double *B, double *C)
{
int i,j,k;
double tmp;
for(i = 0; i < n; i++)
for(j = 0; j < n; j++) {
tmp = 0.0;
for(k = 0; k < n; k++)
tmp += A[n*i+k] *
B[n*k+j];
C[n*i+j] = tmp;
}
}

This is my version: I have included #include


      void mm_sse(int n, double *A, double *B, double *C)
{
int i,j,k;
double tmp;
__m128d a_i, b_i, c_i;
for(i = 0; i < n; i++)
for(j = 0; j < n; j++) {
tmp = 0.0;
for(k = 0; k < n; k+=4)
a_i = __mm_load_ps(&A[n*i+k]);
b_i = __mm_load_ps(&B[n*k+j]);
c_i = __mm_load_ps(&C[n*i+j]);
__m128d tmp1 = __mm_mul_ps(a_i,b_i);
__m128d tmp2 = __mm_hadd_ps(tmp1,tmp1);
__m128d tmp3 = __mm_add_ps(tmp2,tmp3);
__mm_store_ps(&C[n*i+j], tmp3);
}
}

Where am I going wrong with this? I am getting several errors like this:


mm_vec.c(84): error: a value of type "int" cannot be assigned to an entity of type "__m128d"
a_i = __mm_load_ps(&A[n*i+k]);


This is how I am compiling: icc -O2 mm_vec.c -o vec


Can someone please assist me converting this code accurately. Thanks!


UPDATE:


According to your suggestions, I have made the following changes:


       void mm_sse(int n, float *A, float *B, float *C)
{
int i,j,k;
float tmp;
__m128 a_i, b_i, c_i;
for(i = 0; i < n; i++)
for(j = 0; j < n; j++) {
tmp = 0.0;
for(k = 0; k < n; k+=4)
a_i = _mm_load_ps(&A[n*i+k]);
b_i = _mm_load_ps(&B[n*k+j]);
c_i = _mm_load_ps(&C[n*i+j]);
__m128 tmp1 = _mm_mul_ps(a_i,b_i);
__m128 tmp2 = _mm_hadd_ps(tmp1,tmp1);
__m128 tmp3 = _mm_add_ps(tmp2,tmp3);
_mm_store_ps(&C[n*i+j], tmp3);
}
}

But now I seem to be getting a Segmentation fault. I know this perhaps because I am not accessing the array subscripts properly for array A,B,C. I am very new to this and not sure how to proceed with this.


Please help me determine the correct approach towards handling this code.

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...