Identifying Duplicate Words in a String
into individual words, and then count the occurrences of each
word:
<?php
// define string
$str = “baa baa black sheep”;
// trim the whitespace at the ends of the string
$str = trim($str);
C h a p t e r 1 : Wo r k i n g w i t h S t r i n g s 19
// compress the whitespace in the middle of the string
$str = ereg_replace(‘[[:space:]]+’, ‘ ‘, $str);
// decompose the string into an array of “words”
$words = explode(‘ ‘, $str);
// iterate over the array
// count occurrences of each word
// save stats to another array
foreach ($words as $w) {
$wordStats[strtolower($w)]++;
}
// print all duplicate words
// result: “baa”
foreach ($wordStats as $k=>$v) {
if ($v >= 2) { print “$k \r\n”; }
}
?>
Comments
The first task here is to identify the individual words in the sentence or paragraph. You
accomplish this by compressing multiple spaces in the string, and then decomposing
the sentence into words with explode(), using a single space as [the] delimiter.
Next, a new associative array, $wordStats, is initialized and a key is created within
it for every word in the original string. If a word occurs more than once, the value
corresponding to that word’s key in the $wordStats array is incremented by 1.
Once all the words in the string have been processed, the $wordStats array will
contain a list of unique words from the original string, together with a number
indicating each word’s frequency. It is now a simple matter to isolate those keys with
values greater than 1, and print the corresponding words as a list of duplicates.