Berk U. February 2016

Numpy: does matrix-vector multiplication not skip computation when some vector elements are equal to zero?

I've recently been working on a project where a majority of my time is spent multiplying a dense matrix A and a sparse vector v (see here). In my attempts to reduce computation, I've noticed that the runtime of A.dot(v) is not affected by the number of zero entries of v.

To explain why I would expect the runtime to improve in this case, let result = A.dot.v so that result[j] = sum_i(A[i,j]*v[j]) for j = 1...v.shape[0]. If v[j] = 0 then clearly result[j] = 0 no matter the values A[::,j]. In this case, I would therefore expect numpy to just set result[j] = 0 but it seems as if it goes ahead and computes sum_i(A[i,j]*v[j]) anyways.

I went ahead and wrote a short sample script to confirm this behavior below.

import time
import numpy as np

np.__config__.show() #make sure BLAS/LAPACK is being used
np.random.seed(seed = 0)
n_rows, n_cols = 1e5, 1e3

#initialize matrix and vector
A = np.random.rand(n_rows, n_cols)
u = np.random.rand(n_cols)
u = np.require(u, dtype=A.dtype, requirements = ['C'])

#time
start_time = time.time()
A.dot(u)
print "time with %d non-zero entries: %1.5f seconds" % (sum(u==0.0), (time.time() - start_time))

#set all but one entry of u to zero
v = u
set_to_zero = np.random.choice(np.array(range(0, u.shape[0])), size = (u.shape[0]-2), replace=False)
v[set_to_zero] = 0.0

start_time = time.time()
A.dot(v)
print "time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))


#what I would really expect it to take
non_zero_index = np.squeeze(v != 0.0)
A_effective = A[::,non_zero_index]
v_effective = v[non_zero_index]


start_time = time.time()
A_effective.dot(v_effective)
print "expected time with %d non-zero entrie        

Answers


thodnev February 2016

For simple arrays, Numpy doesn't perform such optimizations, but if You need, You may use sparse matrices which may improve dot product timings in that case. For more on the topic, see: http://docs.scipy.org/doc/scipy/reference/sparse.html


hpaulj February 2016

How about experimenting with a simple function like?

def dot2(A,v):
    ind = np.where(v)[0]
    return np.dot(A[:,ind],v[ind])

In [352]: A=np.ones((100,100))

In [360]: timeit v=np.zeros((100,));v[::60]=1;dot2(A,v)
10000 loops, best of 3: 35.4 us per loop

In [362]: timeit v=np.zeros((100,));v[::40]=1;dot2(A,v)
10000 loops, best of 3: 40.1 us per loop

In [364]: timeit v=np.zeros((100,));v[::20]=1;dot2(A,v)
10000 loops, best of 3: 46.5 us per loop

In [365]: timeit v=np.zeros((100,));v[::60]=1;np.dot(A,v)
10000 loops, best of 3: 29.2 us per loop

In [366]: timeit v=np.zeros((100,));v[::20]=1;np.dot(A,v)
10000 loops, best of 3: 28.7 us per loop

A fully iterative Python implentation would be:

def dotit(A,v, test=False):
    n,m = A.shape  
    res = np.zeros(n)
    if test:
        for i in range(n):
            for j in range(m):
                if v[j]:
                    res[i] += A[i,j]*v[j]
    else:
        for i in range(n):
            for j in range(m):
                res[i] += A[i,j]*v[j]
    return res

Obviously this won't be as fast as the compiled dot, but I expect the relative advantages of testing still apply. For further testing you could implement it in cython.

Notice that the v[j] test occurs deep in the iteration.

For a sparse v (3 out of 100 elements) testing saves time:

In [374]: timeit dotit(A,v,True)
100 loops, best of 3: 3.81 ms per loop

In [375]: timeit dotit(A,v,False)
10 loops, best of 3: 21.1 ms per loop

but it costs time if v is dense:

In [376]: timeit dotit(A,np.arange(100),False)
10 loops, best of 3: 22.7 ms per loop

In [377]: timeit dotit(A,np.arange(100),True)
10 loops, best of 3: 25.6 ms per loop

Post Status

Asked in February 2016
Viewed 1,454 times
Voted 4
Answered 2 times

Search




Leave an answer