knifer February 2016

faster way to run a loop in r

I have three dataframes A,B and C.

A has 18000 rows and 18000 columns and B has 150000 rows and 5 cols.

I want to fill elements of A by B.

the loop take a long time. how can I run this loop faster?

example of A

Entrez_Gene_Id 2324 34345 4345 1234 3453
1 Entrez_Gene_Id    0     0    0    0    0
2          23040    0     0    0    0    0
3           7249    0     0    0    0    0
4          64478    0     0    0    0    0
5           4928    0     0    0    0    0
6          58191    0     0    0    0    0

example of B

  head(B)
  V1 Gene1 Gene2      weight   newWeight
1  1  4171  4172  2.01676494 0.020420929
2  2  2237  5111 1.933298567 0.015300857
3  4   506   509 2.439170425 0.020577243
4  7  6635  6636 2.255316779 0.081088975
5  8  6133  6210 3.427969232 0.021132906
6 10 23521  6217 1.607247743 0.027792961   

and this is my code :

B<- data.frame(lapply(C, as.character), stringsAsFactors=FALSE)

for(i in 1:nrow(B)){
  Rname=B[i,2]
  Cname=B[i,3]
  A[Rname,Cname]=B[i,5]
  print(i)
}

Answers


Chris February 2016

It seems as though you are trying to fill a full matrix with a matrix in sparse notation. You can use the dgCMatrix class from the Matrix package to do this:

library(Matrix)
b_mat <- sparseMatrix(i=B[,2],j=B[,3],x=B[,5])

This leaves the Matrix in sparse format. To convert to 18,000 x 18,000 form:

as.data.frame(as.matrix(b_mat))

EDIT: I would suggest leaving the as.data.frame call out here, as the matrix would be easier to work with considering the number of columns you have

Post Status

Asked in February 2016
Viewed 1,939 times
Voted 10
Answered 1 times

Search




Leave an answer