Sentences and paragraphs have different splitting criteria.
Sentences end with full stops, question marks, and exclamation marks.
They can be separated with dashes and other punctuation, but we won't worry about these rare cases here.
Instead of asking Ruby to split the text on one type of character, you simply ask it to split on any of three types of characters, like so:
lines = File.readlines("main.rb") line_count = lines.size # w w w .ja va2 s. c om text = lines.join sentence_count = text.split(/\.|\?|!/).length
Let's look at the regular expression directly:
/\.|\?|!/
The forward slashes at the start and the end are the usual delimiters for a regular expression, so those can be ignored.
The first section is \., and this represents a full stop.
You can't just use . without the backslash.
. represents "any character" in a regular expression, so it needs to be escaped with the backslash to identify itself as a literal full stop.
A question mark in a regular expression usually means "zero or one instances of the previous character".
The ! is not escaped, as it has no other meaning in terms of regular expressions.
The pipes | separate the three main characters, which means they're treated separately so that split can match one or another of them.
puts "Test! I. It? Yes.".split(/\.|\?|!/).length #4
Paragraphs can be split by a double newline. For example:
text = %q{ This is a test of paragraph one. This is a test of paragraph two. This is a test of paragraph three. } puts text.split(/\n\n/).length #3
lines = File.readlines("main.rb") line_count = lines.size # w w w . j a v a 2 s .co m text = lines.join paragraph_count = text.split(/\n\n/).length puts "#{paragraph_count} paragraphs" sentence_count = text.split(/\.|\?|!/).length puts "#{sentence_count} sentences"