Here’s a code snippet fresh from my PHP n-gram search class. The $str arguement
expects a string, $size is the length of the desired n-gram, and $clean lets us
opt-out of some “clean-up” where duplicate n-grams are removed, and non-alphanumeric
characters are removed from the string. It both returns an array and sets a class value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | < ? public function get_ngrams($str, $size = 5, $clean = true) { if ($clean) { $str = strtolower(preg_replace("/[^A-Za-z0-9]/",'',$str)); } for ($i = 0; $i < strlen($str); $i++) { $potential_ngram = substr($str, $i, $size); if (strlen($potential_ngram) > 1) { $arrNgrams[] = $potential_ngram; } } if ($clean) { $arrNgrams = array_unique($arrNgrams); } $this->arrNgrams = $arrNgrams; return($arrNgrams); } ?> |
Categories
- Awards and Recognition (2)
- C# (2)
- GIS/Maps (1)
- Linux Admin (1)
- mySQL (4)
- Personal (2)
- PHP (6)
- SQL Server (1)
- The Business Side (2)
- Uncategorized (2)
- vbscript (1)
leave a reply
You must be logged in to post a comment.