A pretty good heuristic for matching English words. : scan « String « Ruby






A pretty good heuristic for matching English words.


class String
  def word_count
    frequencies = Hash.new(0)
    downcase.scan(/(\w+([-'.]\w+)*)/) { |word, ignore| frequencies[word] += 1 }
    return frequencies
  end
end

%{"this is a test."}.word_count

 








Related examples in the same category

1.extract numbers from a string
2.uses \d to match any digit, and the + that follows \d makes \d match as many digits in a row as possible.
3.scan through all the vowels in a string: [aeiou] means "match any of a, e, i, o, or u."
4.specify ranges of characters inside the square brackets
5.Scan as split
6.Splitting Sentences into Words
7.scan a here document
8.Scan for \w+
9.Scan() string with hex value
10.scan(/./u) string with hex value
11.Count words for a string with quotation marks
12.Just like /\w+/, but doesn't consider underscore part of a word.
13.Anything that's not whitespace is a word.
14.Accept dashes and apostrophes as parts of words.