# Developers Planet

Sujay_K February 2016

### How do I store a 100000*100000 matrix in C++?

I have two vectors `a[100000]` and `b[100000]`. I want to store `a[i]*b[j]` in a matrix. How can I do it in C++?

Pierre February 2016

You can use a `std::vector<std::vector<your_type>>` to store the result.

``````int rows = 100000, cols = 100000;

std::vector<std::vector<double>> result;
result.resize(rows);

for(int i=0; i<rows; i++) {
result[i].resize(cols);
}

for(int i=0; i<rows; i++) {
for(int j=0; j<cols; j++){
result[i][j] = a[i] * b[j];
}
}
``````

Or you can use a linear algebra library, for example Eigen (you'll may have less to code with this), which will surely be more efficient.

Humam Helfawi February 2016

The NON-contiguity part of this answer should be re-researched. It may be wrong.

If you want to work with big number of element such as `100000*100000`. I would not recommend to use `vector of vectors` due to the "NON-contiguity" property of the inner `vectors` elements to each other. Small `push_back` may results in a lot of mess.

I would use single `vector` with a wrapper. See this for more information Clean ways to write multiple 'for' loops .

Kay February 2016

Using an in-RAM `std::vector<double>` would probably not feasible if you have less than 80GB of RAM in your system (for a 100000×100000 matrix of doubles).

Here is how you would do this using an mmap'd file. Please see the inline comments:

``````#include <sys/mman.h>
#include <stddef.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define ROWS 1000
#define COLS 1000
#define FILENAME "./matrix.doubles"

int main(void)
{
double (*matrix)[ROWS][COLS]; // pointer to our matrix
int fd; // file descriptor of backing file

// open backing file
fd = open(FILENAME,
O_CREAT | O_RDWR, // create (if absent) and/or read and writable
S_IRUSR | S_IWUSR); // (only) user may read and write
if (fd < 0) {
perror("Could not open file");
return 1;
}

if ((lseek(fd, sizeof(*matrix), SEEK_SET) == (off_t) -1) ||
ftruncate(fd, sizeof(*matrix)) ||
(lseek(fd, 0, SEEK_SET) == (off_t) -1)) {
perror("Could not set file size.");
return 1;
}

matrix = mmap(NULL, // I don't care were the address starts
sizeof(*matrix), // size of matrix in bytes
MAP_PRIVATE, // we access the data exclusively
fd, // file descriptor of backing file
0); // offset
if (matrix == MAP_FAILED) {
perror("Could not mmap file.");
return 1;
}

// operate on matrix
for (unsigned row = 0; row < ROWS; ++row) {
for (unsigned col = 0; col < COLS; ++col) {
(*matrix)[row][col] = row * col;
}
}

// close backing file
munmap(matrix, sizeof(*matrix));
close(fd);

return 0;
}
``````

This is pure C code. You could fancy up the code by using e.g.

user3528438 February 2016

``````    #include <vector>
class C
{
public:
C(const std::vector<double>& a_, const std::vector<double>& b_)
:a(a_),b(b_){};
double operator()(size_t i, size_t j) const { return a[i]*b[j]; }
private:
std::vector<double> a, b;
};
``````

What the problem actually is?

The original question asks about a way to save `C(i,j)=A(i)*B(j)` to a matrix.

From an OOP point of view, such an `matrix` can be defined as an object with a method takes two inputs(`i` and `j`), and returns a result (`ret=A(i)*B(j)`).

This could be implemented using nested array subscriptions(`c[i][j]`), or linear array indexing(`c[i*100000+j]`), or a function (`c.get(i, j)`). The third way could also be simplified to a functor (`c.operator()(i, j)` or `c(i, j)`).

Then what?

If you agree to all above that any of the three interfaces serves the purpose, or at least partially (like I mentioned in the comment, if the matrix is only required to provide random read access to its elements). Then we continue to implement one of them, 3rd one being my choice.

Why do it that way?

My observation is that, computing the return value is not expensive, so why not calculate the product "lazily" when the product is actually accessed?

In this way, the storage is very efficient (memory usage is reduced from `n^2` to `2n`).

Hiding the multiplication in the getter function does not significantly increase access time (two memory accesses and one multiplication, compared with the ideal case being one memory access only, but both cases are constant time, and this implementation is much more cache friendly for the reduced size of the data).

hungptit March 2016

If you can compute a[i]*b[j] on the fly then you should do that because of two reasons:

• Obtaining the results from a huge matrix may not be faster than compute the product of two double values on the fly.
• 10000x10000 double matrix requires 80 Gbytes of storage (in-memory or on disk) and some extra work might be needed to access pre-computed data.

In my example below, I see 30x speed up (compiled in release mode using clang 3.8) if I compute the product of two double values on the fly for N=20000.

``````template <typename T>
void test_lookup(std::vector<T> &data, std::vector<size_t> &index,
std::vector<T> &results) {
const size_t LOOP = index.size() / 2;
for (size_t idx = 0; idx < LOOP; ++idx) {
auto row = index[2 * idx];
auto col = index[2 * idx + 1];
results[idx] = data[col * LOOP + row];
}
}

template <typename T>
void test_mul(std::vector<T> &x, std::vector<T> &y, std::vector<T> &results) {
for (size_t idx = 0; idx < x.size(); ++idx) {
results[idx] = x[idx] * y[idx];
}
}
``````

#### Post Status

Viewed 1,574 times
Voted 4