Developers Planet

Peter Nixey February 2016

Finding the length of repeating elements in an array

I would like to find the length of streaks within the following array:

streak_lengths = [3, 3, 3, 2, 2, 4, 4, 4, 4]

The array opens with three :read elements, and we label each of those as being part of a 3 streak, it then has two :unread elements, so they're each labelled with a 2 streak, and then finally we have a streak of four read messages, so they're each labelled with a 4.

What is an elegant, efficient and readable way to solve the above problem?

Is it a recursion problem? While I can solve this problem, I feel that it hints at a way of tackling it that I'm not familiar with. It hints that perhaps it's something best solved with recursion.

(For the benefit of the "possibly duplicate" flag applied to this: both threads actually have slightly different discussions. In addition from a search perspective, you would only find the other thread if you search for repeating characters, this the answer to how you detect repeating array elements. Finally there are a load of excellent answers on here, removing this question wouldn't make the ecosystem richer, just poorer)

sawa February 2016

chunk segments into an array consecutive elements that have identical return value when the block is called on them. flat_map concats the arrays returned by the block into a single one.

states.chunk(&:itself).flat_map{|_, a| Array.new(a.length, a.length)}
# => [3, 3, 3, 2, 2, 4, 4, 4, 4]

Perhaps you can do it in a recursive way if you want, but I don't think that would lead to an elegant solution. From my experience, it is better to avoid recursion whenever possible.

Cary Swoveland February 2016

With Ruby v2.2

states.slice_when { |a,b| a != b }.flat_map { |a| [a.size]*a.size }
#=> [3, 3, 3, 2, 2, 4, 4, 4, 4]

With Ruby v2.3

states.chunk_while { |a,b| a == b }.flat_map { |a| [a.size]*a.size }
#=> [3, 3, 3, 2, 2, 4, 4, 4, 4]

but neither of these offer any advantage over plain old chunk.

Kimmo Lehto February 2016

This is practically what the most simple compression algorithms do and in that context it is called Run-Length Encoding or RLE in short.

There's a couple of Ruby examples of how to implement RLE in Ruby found on RosettaCode's Wiki Page on RLE

The most relevant example concerning your question is probably the first one:

# run_encode("aaabbbbc") #=> [["a", 3], ["b", 4], ["c", 1]]
def run_encode(string)
string
.chars
.chunk{|i| i}
.map {|kind, array| [kind, array.length]}
end

This uses the chunk method found in Ruby's own Enumerator class that is common for classes you can enumerate on, such as Arrays and Hashes. It Enumerates over the items, chunking them together based on the return value of the block.

Learning from the example, we can get the kind of output you asked for with the following code (flat_map usage borrowed from @sawa 's answer since mine didn't actually return [2, 2] from [:read, :read] but only 2):

states.chunk{|a| a}.flat_map{|_, a| [a.length] * a.length}
# outputs [3, 3, 3, 2, 2, 4, 4, 4, 4]

Wand Maker February 2016

All nice answers are already posted, hence, I am posting another way to do this - it takes the advantage of the fact if array elements are references to another existing array, then, modification to that existing array will be visible to all elements that reference it.

For example:

a = [1]
b = [a, a, a, a]
#=> [[1], [1], [1], [1]]
a[0] = 2
b
#=> [[2], [2], [2], [2]]

t = [1]
result = []
states.each_with_index do |ele, i|
result[i] = t
if i + 1 < states.size and states[i+1] == states[i]
result[i][0] += 1
else
t = [1]
end
end

p result.flatten
#=> [3, 3, 3, 2, 2, 4, 4, 4, 4]