Home Ask Login Register

Developers Planet

Your answer is one click away!

Moe February 2016

How to count how many times a character appears in a row

My code works well with regular character count

count = Hash.new(0)
str.each_char do |char|
    count[char] += 1 unless char == " "
end
count

For example, "aaabbaaaaacccbbdddd" would equal to 'a' = 8, 'b' = 4, 'c' = 3, 'd' = 4.

I want to do how many times it occurs in a row. The results I want are: 'a' = 3, 'b' = 2, 'a' = 5 'c' = 3, 'b' = 2, and 'd' = 4. How can I do this?

Answers


Gavriel February 2016

To count the maximum length sequence of each character:

count = Hash.new(0)
last_char = nil
occurred = 0
str.each_char do |char|
    if char != last_char
      occurred = 1
    else
      occurred += 1
    end
    last_char = char
    count[char] = occurred if (count[char]||0) < occurred
end
count

Or to get the result like [['a',3],['b',2],['a',5],['c',3],['b',2],['d',4]]:

count = []
last_char = nil
occurred = 0
str.each_char do |char|
    if char != last_char
      count.push([last_char, occurred])
      occurred = 1
    else
      occurred += 1
    end
    last_char = char
end
count.push([last_char, occurred])
count


spickermann February 2016

What about:

string.split(//).slice_when { |a, b| a != b }.
       map { |group| [group.first, group.size] }

#=> [['a', 3], ['b', 2], ['a', 5], ['c', 3], ['b', 2], ['d', 4]]


Schwern February 2016

Instead of a hash, use an array to store pairs as you see them.

str = "aaabbaaaaacccbbdddd"

counts = []
str.each_char do |char|
  # Get the last seen character and count pair
  last_pair = counts[-1] || []

  if last_pair[0] == char
    # This character is the same as the last one, increment its count
    last_pair[1] += 1
  else
    # New character, push a new pair onto the list
    counts.push([char, 1])
  end

end

counts.each { |c|
  puts "#{c[0]} = #{c[1]}"
}

This can be written much more concisely using chunk.

str = "aaabbaaaaacccbbdddd"
counts = []
str.chars.chunk(&:itself).each { |char, chars|
  counts << [char, chars.length]
}
puts counts.inspect

chunk splits a list into chunks. It decides this by calling the block on each element. As long as the block returns the same value as the previous value, it will add to the current chunk. Once it changes, it makes a new chunk. This is similar to what we were doing in the loop before by storing the last seen character.

  if last_seen == char
    # it's the same chunk
  else
    # it's a new chunk
    last_seen = char
  end

itself returns the character right back. So chunk(&:itself) will split the string into chunks of characters.

The new list is the return value of chunk(&:itself) (in our case the character in this chunk) plus the actual chunk (for example the string "aaa").


sawa February 2016

"aaabbaaaaacccbbdddd".each_char.chunk(&:itself).map{|k, v| [k, v.length]}
# => [["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]

I benchmarked the solutions from sawa and spickermann:

require 'benchmark/ips'

def sawa(string)
  string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end

def spickermann(string)
  string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end

Benchmark.ips do |x|
  string = "aaabbaaaaacccbbdddd"

  x.report("sawa") { sawa string }
  x.report("spickerman") { spickermann string }

  x.compare!
end

# Calculating -------------------------------------
#                 sawa     6.293k i/100ms
#          spickermann     4.447k i/100ms
# -------------------------------------------------
#                 sawa     75.353k (±10.4%) i/s -    371.287k
#          spickermann     48.661k (±12.0%) i/s -    240.138k
# 
# Comparison:
#                 sawa:    75353.5 i/s
#          spickermann:    48660.7 i/s - 1.55x slower


Tasos Stathopoulos February 2016

I prefer regular expressions for this kind of problems:

str = "aaabbaaaaacccbbdddd"
counts = str.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
  occurs << [match[1], match[0].size]

  occurs
end
puts counts.inspect #=>[["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]]

Edit:

I ran the same benchmark with @sawa and I added the regular expression way. It seems a little faster. Furthermore, #itself is not working for ruby < 2.2.x

require 'benchmark/ips'

def sawa(string)
  string.each_char.chunk(&:itself).map{|k, v| [k, v.length] }
end

def spickermann(string)
  string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] }
end

def stathopa(string)
  string.scan(/(?<seq>(?<char>\w)\k<char>+)/).inject([]) do |occurs, match|
    occurs << [match[1], match[0].size]

    occurs
  end
end

Benchmark.ips do |x|
  string = "aaabbaaaaacccbbdddd"

  x.report("sawa") { sawa string }
  x.report("spickerman") { spickermann string }
  x.report("stathopa") { stathopa string }

  x.compare!
end

# Calculating -------------------------------------
#                 sawa     6.730k i/100ms
#           spickerman     4.061k i/100ms
#             stathopa    11.969k i/100ms
# -------------------------------------------------
#                 sawa     70.072k (± 8.9%) i/s -    349.960k
#           spickerman     43.652k (± 9.5%) i/s -    219.294k
#             stathopa    132.992k (± 8.8%) i/s -    670.264k
# 
# Comparison:
#             stathopa:   132992.1 i/s
#                 sawa:    70072.4 i/s - 1.90x slower
#           spickerman:    43651.6 i/s - 3.05x slower
# 


Wand Maker February 2016

Here is one way to do this:

s = "aaabbaaaaacccbbdddd"
s.chars.uniq.map do |c|
  p [c, s.split(/[^#{c}]+/).reject(&:empty?).map(&:size)]
end.to_h
#=> {"a"=>[3, 5], "b"=>[2, 2], "c"=>[3], "d"=>[4]}

Post Status

Asked in February 2016
Viewed 1,145 times
Voted 12
Answered 6 times

Search




Leave an answer


Quote of the day: live life