BurgerBurglar February 2016

Is heap sort supposed to be very slow on MATLAB?

I wrote a heap sort function in MATLAB and it works fine, except that when the length of input is greater or equal to 1000, it can take a long time (e.g. the length of 1000 takes half a second). I'm not sure if it's that MATLAB doesn't run very fast on heap sort algorithm or it's just my code needs to be improved. My code is shown below:

function b = heapsort(a)

[~,n] = size(a);
b = zeros(1,n);
for i = 1:n
    a = build_max_heap(a);
    b(n+1-i) = a(1);

    temp = a(1);
    a(1) = a(n+1-i);
    a(n+1-i) = temp;

    a(n+1-i) = [];
    a = heapify(a,1);
end
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function a = build_max_heap(a)
[~,n] = size(a);
m = floor(n/2);
for i = m:-1:1
    a = heapify(a,i);
end
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function a = heapify(a,i)
[~,n] = size(a);

left = 2*i;
right = 2*i + 1;

if left <= n 
    if a(left) >= a(i)
        large = left;
    else
        large = i;
    end
else
    return
end
if right <= n
    if a(right) >= a(large)
        large = right;
    end
end

if large ~= i
    temp = a(large);
    a(large) = a(i);
    a(i) = temp;
    a = heapify(a,large);
end
end

I'm aware that maybe it's the code a(n+1-i) = []; that may consume a lot of time. But when I changed the [] into -999 (lower than any number of the input vector), it doesn't help but took even more time.

Answers


Stewie Griffin February 2016

You should use the profiler to check which lines that takes the most time. It's definitely a(n+1-i) = []; that's slowing down your function.

Resizing arrays in loops is very slow, so you should always try to avoid it.

A simple test:

  • Create a function that takes a large vector as input, and iteratively removes elements until it's empty.
  • Create a function that takes the same vector as input and iteratively sets each value to 0, Inf, NaN or something else.

Use timeit to check which function is faster. You'll see that the last function is approximately 100 times faster (depending on the size of the vector of course).

The reason why -999 takes more time is most likely because a no longer gets smaller and smaller, thus a = heapify(a,1); won't get faster and faster. I haven't tested it, but if you try the following in your first function you'll probably get a much faster program (you must insert the n+1-i) other places in your code as well, but I'll leave that to you):

a(n+1-ii) = NaN;
a(1:n+1-ii) = heapify(a(1:n+1-ii),1);

Note that I changed i to ii. That's partially because I want to give you a good advice, and partially to avoid being reminded to not is i and j as variables in MATLAB.

Post Status

Asked in February 2016
Viewed 2,528 times
Voted 14
Answered 1 times

Search




Leave an answer