Jul 11

I’ve been developing an n-gram based search in PHP for a secondary project at work. For development purposes, I’ve been scoring things in PHP while I tweaked by algorithm, but now that I’m getting closer to the implementation stage of things I’m trying to push the work of scoring onto mySQL.

I was hoping mySQL had some method of counting occurrences of a string within a string, but much to my chagrin it doesn’t have this. Luckily, Andrew Hanna posted a great custom mySQL function that does exactly this. For the sake of posterity and my own reference, I’ve posted his function below.

DROP FUNCTION IF EXISTS substrCount||
CREATE FUNCTION substrCount(s VARCHAR(255), ss VARCHAR(255)) RETURNS TINYINT(3) UNSIGNED LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE count TINYINT(3) UNSIGNED;
DECLARE offset TINYINT(3) UNSIGNED;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET s = NULL;

SET count = 0;
SET offset = 1;

REPEAT
IF NOT ISNULL(s) AND offset > 0 THEN
SET offset = LOCATE(ss, s, offset);
IF offset > 0 THEN
SET count = count + 1;
SET offset = offset + 1;
END IF;
END IF;
UNTIL ISNULL(s) OR offset = 0 END REPEAT;

RETURN count;
END;

usage is this…

SELECT substrCount('/this/is/a/path', '/') `count`;

leave a reply

You must be logged in to post a comment.