I would like to use this installment of my blog to show some differences between VFP and MYSQL on how problems need to be solved. On the top it seems to me that the MYSQL optimizer needs a quite different strategy and some more help then the “rushmore” one in VFP. There is probably some room for improvements left in both approaches but I usually stop optimizing when I get under 2 seconds on getting results.
The task was to roll back the AR aging to a prior date. Basically a standard business problem. Accounting comes, looks at last months aging, makes some adjustments and then wants to rerun the aging, or the bank wants it as of a particular date or things just went nuts and everyone was busy with … whatevever …. and did not get to run the aging at the proper time. So to handle this a query was devised to back out all invoices and applications as to a certain date.
In the late 80’s the customer got and SBT accounting system and that was then modified and modernized over the decades so the invoices, credits and payments are stored in the invoice master file. fields if interest: Invamt, Paidamt, Invdte, Balance, custno, salesmn etc. Cash and credit applications are in a file called cc_apply in VFP the problem was solved with a query that ran into a little under 2 seconds and looked like this
SELECT a.custno, ;
a.invno AS invno, ;
a.invdte AS invdte, ;
a.balance + NVL((SELECT SUM(f.apply + f.disc) ;
FROM “accounts receivables!cc_apply” f;
WHERE f.invno = a.invno AND TTOD(f.appdate) > ld_date), 0) ;
- NVL((SELECT SUM(f.apply) ;
FROM “accounts receivables!cc_apply” f;
WHERE f.ccati = a.invno AND TTOD(f.appdate) > ld_date), 0) AS balance, ;
a.paidamt – NVL((SELECT SUM(f.apply + f.disc) ;
FROM “accounts receivables!cc_apply” f;
WHERE f.invno = a.invno AND TTOD(f.appdate) > ld_date), 0) AS paidamt, ;
a.invamt – NVL((SELECT SUM(f.apply) ;
FROM “accounts receivables!cc_apply” f;
WHERE f.ccati = a.invno AND TTOD(f.appdate) > ld_date), 0) AS invamt ;
FROM “accounts receivables!invoice master” a, customer\customer cu, ;
(SELECT DISTINCT e.invno as invno FROM (SELECT d.invno as invno ;
FROM “accounts receivables!invoice master” d ;
WHERE d.invdte <= ld_date AND d.balance # 0 AND !d.arstat $ “Vv” ;
UNION SELECT d.invno as invno ;
FROM “accounts receivables!invoice master” d ,;
“accounts receivables!cc_apply” b ;
WHERE d.invno = b.invno AND d.invdte <= ld_date AND d.balance = 0 AND !d.arstat $ “Vv” ; AND TTOD(b.appdate) > ld_date ;
AND (b.apply # 0 OR b.disc # 0) ;
UNION SELECT d.invno as invno ;
FROM “accounts receivables!invoice master” d ,;
“accounts receivables!cc_apply” b ;
WHERE d.invno = b.ccati AND d.invdte <= ld_date AND d.balance = 0 AND !d.arstat $ “Vv” ; AND TTOD(b.appdate) > ld_date ;
AND (b.apply # 0 OR b.disc # 0)) e) c ;
WHERE a.invno = c.invno AND a.custno = cu.custno AND !a.arstat $ “Vv” AND &lc_filter;
ORDER BY a.custno, a.invdte, a.invno ;
INTO CURSOR c_arorec NOFILTER READWRITE
&lc_filter was used to filter out some other parameters like sales rep, state, customer etc.
The query was ported into MYSQL, NVL() was replaced with COALESCE() and the statement that took about 2 seconds under VFP took a little over 50 in MYSQL. So a different approach was needed. MYSQL, even with hints seemed to have problems using the right indexes or come up with a reasonable plan as the EXPLAIN statement showed. Even though VFP (Visual foxpro) does not have a feature to show you exactly what it is doing, on a proper written query it will do most the selecting inside the indexes if possible as that is cheaper. The bottle neck in VFP is usually the network. As the processing gets done at the workstation you want to cut he total # of bytes sent to the workstation for processing down.
In MYSQL its disk reads. I could not convince the optimizer to do things in the sequence I wanted on that query so I had no choice than the break that query and spoon feed it to the optimizer. From past trials it appears to me that the way to go with MYSQL is to cut down rowcount and disk read/writes. So here is the way I got MYSQL to solve the above problem in under 2 seconds.
For starters to cut the row count down as early as possible I decided to employ some level of denormalisation and put the custno, salesman, state,invdte, dtepaid into the cc_apply file. I keep that in sync in the INSERT and update triggers and with foreign keys. This puts a little overhead on the INSERT and UPDATE statement but pays of big time at the time of data SELECTION. Furthermore, as we were going to do this in multiple SQL statement we wrap those statement into a stored procedure to handle our parameters and selections like
CREATE DEFINER=`whoever`@`%` PROCEDURE `arorec`(v_date DATE,
v_state varchar(2), v_salesmn varchar(2), v_custno varchar(6), v_sum TINYINT)
READS SQL DATA
DETERMINISTIC
BEGIN
SET @v_date = v_date;
SET @v_custno = v_custno;
SET @v_state = v_state;
SET @v_salesmn = v_salesmn;
SET @v_sum = v_sum; — summary or not
/*Note need to use tempres and tempcr as you cannot use temp table(tempsel) 2x in a union*/
DROP TABLE IF EXISTS tempsel;
DROP TABLE IF EXISTS tempres;
DROP TABLE IF EXISTS tempcr;
SO for starters we needed all the invoices that were created before the cut of date and that had moneys applied to them after the cut of date. As the resulting set is definitly way below the available ram on the target server we will put this resutl set into a temporary table in memory with
IF @v_custno IS null AND @v_state IS NULL AND @v_salesmn IS NULL THEN
/* GET EVERYONE*/
PREPARE stmt1 FROM ”
CREATE TEMPORARY TABLE tempsel ENGINE = MEMORY SELECT invno, Sum(apply) as apply, SUM(disc) AS disc
FROM cc_apply WHERE invdte <= ? AND appdate > ? GROUP BY invno” ;
EXECUTE stmt1 USING @v_date, @v_date;
ELSEIF @v_custno IS NOT NULL AND @v_salesmn IS NULL THEN
/* State is irrelevant as each customer has one address customere only*/
…….The selections for other criteria
Next we had to join this with the invoice and CM header records in the invoice master table plus add all the invoices that were created the cutoff date, are still open and did not have anything applied to them. Normally we could do that in a UNION, but MYSQL does not let you use a UNION on a TEMPORARY TABLE so we use another temporary table like this
PREPARE stmt1 FROM ”
CREATE TEMPORARY TABLE tempres ENGINE = MEMORY
SELECT ma.custno, ma.ticket AS refno, ma.dtepaid, ma.artype, ma.ponum AS checkno, 0000000000.00 AS future,
ma.invno, ma.balance + te.apply – te.disc AS BALANCE, ma.entered, ma.invdte, ma.invamt, ma.paidamt, getpd(@v_date, ma.invdte) AS gp
FROM `invoice master` ma INNER JOIN tempsel te ON ma.invno = te.invno
WHERE ma.invdte <= ? AND artype IN (‘ ‘, ‘I’, ‘C’, ‘W’)”;
EXECUTE stmt1 USING @v_date;
DEALLOCATE PREPARE stmt1;
ALTER TABLE tempres ADD INDEX custno(custno);
/* UNION of invoices still open without anything applied and futures*/
IF @v_custno IS null AND @v_state IS NULL AND @v_salesmn IS NULL THEN
Prepare stmt1 FROM ”
INSERT INTO tempres
SELECT ma.custno, ma.ticket AS refno, ma.dtepaid, ma.artype, ma.ponum AS checkno, 0000000000.00 AS future,
ma.invno, ma.balance, ma.entered, ma.invdte, ma.invamt, ma.paidamt, getpd(@v_date, ma.invdte) AS gp
FROM `invoice master` ma LEFt JOIN tempsel t2 ON ma.invno = t2.invno
WHERE t2.invno IS NULL and ma.open = 1 and ma.invdte <= ? AND artype IN (‘ ‘, ‘I’, ‘C’, ‘W’) UNION SELECT ma.custno, ma.ticket AS refno, ma.dtepaid, ma.artype, ma.ponum AS checkno, ma.invamt AS future, ma.invno, ma.balance + COALESCE(cc.apply, 0) – COALESCE(cc.disc, 0) AS BALANCE, ma.entered, ma.invdte, ma.invamt, ma.paidamt, getpd(@v_date, ma.invdte) AS gp FROM `invoice master` ma LEFT JOIN cc_apply cc ON ma.invno = cc.invno WHERE ma.invdte > ? AND ma.entered <= ? AND artype IN (‘ ‘, ‘I’, ‘C’, ‘W’)”;
EXECUTE stmt1 USING @v_date, @v_date, @v_date;
ELSEIF @v_custno IS NOT NULL AND @v_salesmn IS NULL THEN
……. The other statements for selection by salesmn, state, etc
We then do the same for the payments and end up with a temporary table “TEMPRES” that holds all the records needed for our aging.
In VFP we then used “SET RELATION TO” to get access to some other relevant fields for our reporting but it is much more efficient to have the MYSQL server send all the data in one result set so we are doing this in the final SELECT statement that looks something like this
SELECT te.*, cu.email, cu.statml, cu.statfmt, cu.company, cu.phone, cu.faxno, cu.contact, cu.address1,
cu.address2, cu.city, cu.state AS state, cu.zip, cu.pterms, cu.lastpay AS lastpay, cu.limit,
cu.lpymt, cu.ldate AS ldate, cu.lsale, cu.inactive AS inactive,
IF(te.gp=’C',te.invamt – te.paidamt,0.00) AS current,
IF(te.gp=’1′,te.invamt – te.paidamt,0.00) AS pd1,
IF(te.gp=’2′,te.invamt – te.paidamt,0.00) AS pd2,
IF(te.gp=’3′,te.invamt – te.paidamt,0.00) AS pd3,
IF(te.gp=’3′,te.invamt – te.paidamt,0.00) AS pd4,
IF(te.gp=’O',te.invamt – te.paidamt,0.00) AS pdover
FROM tempres te INNER JOIN customer cu ON te.custno = cu.custno
ORDER by te.custno ASC, te.invdte ASC, te.invno ASC;
The getpd() is a stored function that determines based on parameters stored in another table if we want the output in 30/60/90/120, 7/14/28/35. based on the end of the week, end of month or any date in between to allow full flexibility in the aging output.
For those interested. On a table with about 500,000 invoices records, 300,000 records in the application table, on my development server on my laptop the whole SP takes 1.4 seconds in the worst case scenario. Granted the complete stored procedure with formattin as shown above is over 300 lines long with many statements as compared to one straight forward SQL statement in VFP. But the end result is on par with VFP with the added benefit to provide improved results over slow connections over the internet
Tags: MYSQL, Optimizing, Queries, VFP