Berk U. February 2016
### Numpy: does matrix-vector multiplication not skip computation when some vector elements are equal to zero?

I've recently been working on a project where a majority of my time is spent multiplying a dense matrix `A`

and a sparse vector `v`

(see here). In my attempts to reduce computation, I've noticed that the runtime of `A.dot(v)`

is not affected by the number of zero entries of `v`

.

To explain why I would expect the runtime to improve in this case, let `result = A.dot.v`

so that `result[j] = sum_i(A[i,j]*v[j]) for j = 1...v.shape[0]`

. If `v[j] = 0`

then clearly `result[j] = 0`

no matter the values `A[::,j]`

. In this case, I would therefore expect numpy to just set `result[j] = 0`

but it seems as if it goes ahead and computes `sum_i(A[i,j]*v[j])`

anyways.

I went ahead and wrote a short sample script to confirm this behavior below.

```
import time
import numpy as np
np.__config__.show() #make sure BLAS/LAPACK is being used
np.random.seed(seed = 0)
n_rows, n_cols = 1e5, 1e3
#initialize matrix and vector
A = np.random.rand(n_rows, n_cols)
u = np.random.rand(n_cols)
u = np.require(u, dtype=A.dtype, requirements = ['C'])
#time
start_time = time.time()
A.dot(u)
print "time with %d non-zero entries: %1.5f seconds" % (sum(u==0.0), (time.time() - start_time))
#set all but one entry of u to zero
v = u
set_to_zero = np.random.choice(np.array(range(0, u.shape[0])), size = (u.shape[0]-2), replace=False)
v[set_to_zero] = 0.0
start_time = time.time()
A.dot(v)
print "time with %d non-zero entries: %1.5f seconds" % (sum(v==0.0), (time.time() - start_time))
#what I would really expect it to take
non_zero_index = np.squeeze(v != 0.0)
A_effective = A[::,non_zero_index]
v_effective = v[non_zero_index]
start_time = time.time()
A_effective.dot(v_effective)
print "expected time with %d non-zero entrie
```

```
```

```
```

```
```### Answers

thodnev February 2016
For simple arrays, Numpy doesn't perform such optimizations, but if You need, You may use sparse matrices which may improve dot product timings in that case.
For more on the topic, see: http://docs.scipy.org/doc/scipy/reference/sparse.html

hpaulj February 2016
How about experimenting with a simple function like?

```
def dot2(A,v):
ind = np.where(v)[0]
return np.dot(A[:,ind],v[ind])
In [352]: A=np.ones((100,100))
In [360]: timeit v=np.zeros((100,));v[::60]=1;dot2(A,v)
10000 loops, best of 3: 35.4 us per loop
In [362]: timeit v=np.zeros((100,));v[::40]=1;dot2(A,v)
10000 loops, best of 3: 40.1 us per loop
In [364]: timeit v=np.zeros((100,));v[::20]=1;dot2(A,v)
10000 loops, best of 3: 46.5 us per loop
In [365]: timeit v=np.zeros((100,));v[::60]=1;np.dot(A,v)
10000 loops, best of 3: 29.2 us per loop
In [366]: timeit v=np.zeros((100,));v[::20]=1;np.dot(A,v)
10000 loops, best of 3: 28.7 us per loop
```

A fully iterative Python implentation would be:

```
def dotit(A,v, test=False):
n,m = A.shape
res = np.zeros(n)
if test:
for i in range(n):
for j in range(m):
if v[j]:
res[i] += A[i,j]*v[j]
else:
for i in range(n):
for j in range(m):
res[i] += A[i,j]*v[j]
return res
```

Obviously this won't be as fast as the compiled `dot`

, but I expect the relative advantages of testing still apply. For further testing you could implement it in `cython`

.

Notice that the `v[j]`

test occurs deep in the iteration.

For a sparse `v`

(3 out of 100 elements) testing saves time:

```
In [374]: timeit dotit(A,v,True)
100 loops, best of 3: 3.81 ms per loop
In [375]: timeit dotit(A,v,False)
10 loops, best of 3: 21.1 ms per loop
```

but it costs time if `v`

is dense:

```
In [376]: timeit dotit(A,np.arange(100),False)
10 loops, best of 3: 22.7 ms per loop
In [377]: timeit dotit(A,np.arange(100),True)
10 loops, best of 3: 25.6 ms per loop
```

```
```#### Post Status

Asked in February 2016

Viewed 1,454 times

Voted 4

Answered 2 times
#### Search

## Leave an answer

```
```

```
```