Functions: What is the Goddiest line in the bible?

To be sure, the word God occurs a fair few times in the bible. 4117 times at least.:

SELECT count(*) FROM bible WHERE line RLIKE 'god';
+----------+
| count(*) |
+----------+
|     4117 |
+----------+

In fact 4117 is a tad on the low side. The count(*) query scores 1 point for each line with the word 'god' in it, but plenty of lines mention god twice or more. Not to mention all the other ways of naming God: Lord, Jehovah, our Father etc.

So let's construct a user defined SQL function that will count the number of times the Big G is mentioned on any line. Then we can really see how many times God is mentioned in the bible.

First things first: we have to define what counts as a single reference to God. The first verse of the bible contains one reference.

select id,line from bible where id = 1 \G
*************************** 1. row ***************************
  id: 1
line: In the beginning God created the heaven and the earth.

But line 787 contains three references. The first LORD, then LORD God, and finally God of Isaac.

  id: 787
line: And, behold, the LORD stood above it, and said, I am the LORD God of Abraham thy 
father, and the God of Isaac: the land whereon thou liest, to thee will I give it, and to thy 
seed;

Here's a verse which has two references: LORD and LORD JEHOVAH.

  id: 18135
line: Trust ye in the LORD for ever: for in the LORD JEHOVAH is everlasting strength:

By the way, you can't rely on the King James bible to be consistent with capitalisation. Mostly its LORD God, but you will find a fair few Lord GODs as well. With built in functions like rlike, MySQL is case insenstive out of the box. But to force case-sensitive comparisons, the easiest (but by no means only) way is to use the keyword BINARY, e.g.

SELECT id,line FROM bible WHERE line RLIKE BINARY 'LORD God' ;

I digress. To help our new godcount function, we will put all the different ways of referring to God into a table together with an accompanying wordcount, like so:

DROP TABLE IF EXISTS godnames;

CREATE TABLE godnames (
   autokey TINYINT UNSIGNED  NOT NULL PRIMARY KEY AUTO_INCREMENT,
   gname VARCHAR(100),
   wc TINYINT UNSIGNED  NOT NULL
);

INSERT INTO godnames (gname)
VALUES
('God'),('GOD'),('Lord'),('LORD'),('Jehovah'), ('Almighty'),
('Almighty God'),('Lord Almighty'),('LORD JEHOVAH'), ('Lord God Almighty'),
('Lord God'), ('LORD GOD'),('LORD God'),('Lord GOD'),
('LORD my God'),('LORD is God'),('LORD be God'),
('LORD our God'),('LORD thy God'),('LORD his God'),('LORD the God'),
('LORD your God'),('LORD their God')
;

UPDATE godnames SET wc = 1 + LENGTH(gname) - LENGTH(REPLACE(gname, ' ', ''));

SELECT * FROM godnames;
+---------+-------------------+----+
| autokey | gname             | wc |
+---------+-------------------+----+
|       1 | God               |  1 |
|       2 | GOD               |  1 |
|       3 | Lord              |  1 |
|       4 | LORD              |  1 |
|       5 | Jehovah           |  1 |
|       6 | Almighty          |  1 |
|       7 | Almighty God      |  2 |
|       8 | Lord Almighty     |  2 |
|       9 | LORD JEHOVAH      |  2 |
|      10 | Lord God Almighty |  3 |
|      11 | Lord God          |  2 |
|      12 | LORD GOD          |  2 |
|      13 | LORD God          |  2 |
|      14 | Lord GOD          |  2 |
|      15 | LORD my God       |  3 |
|      16 | LORD IS God       |  3 |
|      17 | LORD be God       |  3 |
|      18 | LORD our God      |  3 |
|      19 | LORD thy God      |  3 |
|      20 | LORD his God      |  3 |
|      21 | LORD the God      |  3 |
|      22 | LORD your God     |  3 |
|      23 | LORD their God    |  3 |
+---------+-------------------+----+

What this table says is a formula like 'Lord my God' is one reference to God, rather than two ('Lord' and 'God'). Note that I've worked in all the different case options. And also note if you find other ways of referring to God in the bible, (which there are, this is an SQL tutorial not a divinity lesson) it would pretty easy to insert them in to this table.

The last column is wc for word count. I didn't manually count the words. I use a crafy bit of SQL to do it for me:

UPDATE godnames SET wc = 1 + LENGTH(gname) - LENGTH(REPLACE(gname, ' ', ''));

This query zaps any spaces in gname, so "Lord God' becomes 'LordGod'. Now there's 8 letters in 'Lord God' but only 7 in 'LordGod'. So the word count is given by 8 -7 = 1, so the word count is 1. Well, 2 actually, as we have to add 1 on to that difference. Another example: there's 11 letters in 'Lord my god' but only 9 in 'lordmygod'. 11 - 9 = 2, so the word count is 2 + 1 = three. This replace and count technique will be used in the godcount function below, so keep it in mind.

Another built in function we will use in our godcount function is REPEAT(), which simply repeats a string so many times

SELECT REPEAT('x',3);
+---------------+
| REPEAT('x',3) |
+---------------+
| xxx           |
+---------------+


Now for our big user defined godcount() function. The alogorithm basically loops through every name in the godnames table, starting with the longest names first and ending with the shortest. If there's a godname in the line under consideration we replace the godname in the line with XXXXXXX but, here's the trick, with one less X then there are letters in the godname. So instead of "LORD GOD" we put in "XXXXXXX". Subtracting the X'ed out line length from the original length tells us how many times that particular godname crops up per line. We keep on looping through the godnames table until all the godnames are used up. A variable, count, within our procedure to keeps track of the count so we can return it at the end.

DROP FUNCTION IF EXISTS godcount;

delimiter //

CREATE FUNCTION godcount (line TEXT) RETURNS INT
   SQL SECURITY DEFINER
   DETERMINISTIC
   LANGUAGE SQL
   COMMENT 'Counts the number of times God appears in a line'

   BEGIN
      DECLARE counter INT DEFAULT 0;
      DECLARE all_rows_selected INT DEFAULT 0;
      DECLARE line_copy_1, line_copy_2 TEXT;
      DECLARE godname VARCHAR(100);

      DECLARE gnc CURSOR FOR
              SELECT gname FROM godnames ORDER BY wc DESC;

      DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET all_rows_selected = 1;

      SET line_copy_1 = line;

      OPEN gnc;
      godname_loop: LOOP

          FETCH gnc INTO godname;
          IF all_rows_selected THEN LEAVE godname_loop; END IF;

          SET line_copy_2 = REPLACE(line_copy_1, godname, REPEAT('x',LENGTH(godname) -1));  
          SET counter = counter + LENGTH(line_copy_1) - LENGTH(line_copy_2);
          SET line_copy_1 = line_copy_2;

      END LOOP godname_loop;
      CLOSE gnc;
 
      RETURN counter;
   END;
//
delimiter ;

Examining the code a bit more carefully, the first line is a DROP FUNCTION IF EXISTS statement. This makes for much easier cutting and pasting during development. I find Stored Procedure syntax quite fiddly and it takes me umpteen times to get it right.

The second line redefines the end of line delimiter in the mysql client from a ; to a //, otherwise we would never be able to enter our stored procedure in the first place.

The third line CREATE FUNCTION godcount (line TEXT) RETURNS INT tells the server that the godcount() function expectd to be fed some TEXT and will throw up an INT in return. Strictly speaking we could have made it return INT UNSIGNED, as a line can hardly have a negative godcount.

The next four lines aren't necessary at all, but they're kind of nice to express explicity. SQL SECURITY DEFINER means that any calling this function will have your level of SQL access permissions regarding tables. The alternative is SQL SECURITY INVOKER which means a user would keep their own privileges. DEFINER is the default.

DETERMINISTIC is suppoed to give a little hint to the optimizer as to how your algorithm works. A deterministic query always gives the same answer for a given input, which in the case of the bible is true. (Unless the text of the bible changes, which it really shouldn't.) The opposite is NOT DETERMINISTIC. Either way it's not too important because MySQL 5.0 ignores it. So much for being kind to the optimizer.

LANGUAGE SQL means the function is written in Structured Query Language, which is the only language MySQL understands, so this is the default value. I guess it opens the way for LANGUAGE PERL or LANGUAGE SHAKESPEARE to be options for future releases of MySQL.. As far as I know there are no plans to do this, but what with MySQL being an Open Source product, I'm sure somebody out there is working on a Perl version as we speak.

COMMENT is, of course, a comment, which shows up in 'SHOW CREATE FUNCTION godcount' or buried in the depths of the information_schema database.

Now for the function body, which is wrapped in a BEGIN... END; block.
First we declare some variables. SQL isn't like a scripting language where you can just magic a variable into existence. Then we declare a CURSOR, which is a way of using a SELECT to loop through a table. The cursor gnc is a container for all of our godnames in the godname table. If this was a 'normal' language gnc would be an array, or a hash, but in SQL land it's a cursor attached to a SELECT.

To get all the godnames to loop through into the gnc variable we have to use a very inelegant trick, which is what you see with DECLARE CONTINUE HANDLER on the next line. SQLSTATE 02000 is what happens when a select runs out of rows, it means sorry, no more data. So this handler says when there are no more rows, don't bail out with an error, but instead set this little wee flag here, all_rows_selected, which we declared as in INT a few lines earlier.

SET line_copy_1 = line; - Because we plan to comprehensively mangle the line passed in as argument, we copy it to a variable.

OPEN gnc; - Cursors have to be opened before they can be used, which basically runs the select statement. Then we kick off another loop, which I've labelled godname_loop. Loops don't have to be labelled, but anything which helps make code syntax more readable is a good thing in my book.

FETCH gnc INTO godname; puts the value 'Lord their God' from the godnames table into our routine's godname variable. Next we check to see whether our all_rows_selected flag is set.

The first of the three SET statements replaces any godname in the line with XXXX, only one less X than needed. (see above). Then we do the length substraction thing (also see above) and add on any difference to our total count. We hold on on to our line_copy_2 variable, even though it may have loads of Xs in it, as we don't want to count any godnames twice.

The inner loop terminates when the godname table is exhausted of all possibilities. All that remains to do is return the counter. Job done.

Let's check our new function works properly. It should give a count of three for line 787 which we considered above:

SELECT id, godcount(line), line FROM bible WHERE id = 787 \G
*************************** 1. row ***************************
            id: 787
godcount(line): 3
          line: AND, behold, the LORD stood above it, AND said, I am the LORD God of Abraham thy father, AND the God of Isaac: the land whereon thou liest, to thee will I give it, AND to thy seed;

Bingo! Three.
Now let's do what we set out to do and find the goddiest line in the bible. (Yes, I know that's a made-up word.) I should warn you this query takes a few seconds to run.

SELECT MAX(GODCOUNT(line)) FROM bible;
+---------------------+
| MAX(GODCOUNT(line)) |
+---------------------+
|                   6 |
+---------------------+
1 row IN SET (7.51 sec)

OK, so at least one line has a godcount of 6. Let's have a look:


SELECT id, book,cap,verse, line FROM bible WHERE GODCOUNT(line)=6 \G
*************************** 1. row ***************************
   id: 28287
 book: Rom
  cap: 14
verse: 6
 line: He that regardeth the day, regardeth it unto the Lord; and he that regardeth not the day,
 to the Lord he doth not regard it. He that eateth, eateth to the Lord, for he giveth God thanks; 
and he that eateth not, to the Lord he eateth not, and giveth God thanks.

There you go, Romans Cap 14 verse 6 is the goddiest line in the bible, with 6 mentions of the Lord.

And what's the total number of times God is mentioned in the bible? All we need to do is wrap godcount() in SUM() and wait for a few seconds.

SELECT SUM(GODCOUNT(line)) AS total_godcount  FROM bible ;
+----------------+
| total_godcount |
+----------------+
|          11000 |
+----------------+
1 row IN SET (7.24 sec)