Call(Overhead)
The Challenge
For decades, MV BASIC has featured the CALL() statement, allowing for external subroutines invoked by mainline programs. Typically, one does this to share common code/processing/algorithms, thus upholding the development principle of DRY ( Don't Repeat Yourself ). However, such features often come with a performance penalty. The question is, "How much of a performance penalty?" .
Recently, while reading through the D3 Reference Manual for version 9.2.1, I noticed some performance tips mentioned in the Compiling Programs subsection of the chapter on BASIC. In particular, this statement:
To obtain the best possible compile-time and run-time performance, Rocket recommends breaking up applications into small modules. Pick BASIC removes the traditional run-time overhead of large numbers of calls and is able to create more efficient code when modules are smaller
The part of that stood out to me was, "Pick BASIC removes the traditional run-time overhead of large numbers of calls...". Oh, really? That sounded like something worth testing. So, I did.
The Method
I didn't have a lot of free time, so I knew from the start that my approach would be simple. All I wanted to do was quickly discover how much of a performance penalty there might be using a subroutine CALL() vs. in-line code. Since I was doing the research, I decided to also test if making local subroutine calls via GOSUB had any significant impact on performance.
Not wanting to consider file I/O, my program only did a simple mathematical calculation within a tight FOR/NEXT loop.
FOR THIS.CALC = 1 TO MAX.CALCS
RESULT = ((THIS.CALC*THIS.CALC) + THIS.CALC) / THIS.CALC
NEXT THIS.CALC
To keep it somewhat flexible for others to test their own platforms and scenarios, I included several control options — the user can control how many times each test runs, the number of iterations of the FOR/NEXT calculation loop, and the SYSTEM() code used to get the CPU time in milliseconds for their particular MV implementation.
Since my main platform is D3, I tested both Flashed and non-Flashed versions of the programs.
The Results
The tables below show, on average, how much longer the test program took to run through its calculations when using CALL() vs. in-line code.
Flashed Code
Iterations |
x In-Line Time |
25,000 |
1.59 |
50,000 |
1.50 |
75,000 |
1.48 |
100,000 |
1.47 |
250,000 |
1.49 |
500,000 |
1.50 |
750,000 |
1.21 |
1,000,000 |
1.10 |
Non-Flashed Code
Iterations |
x In-Line Time |
25,000 |
4.00 |
50,000 |
3.88 |
75,000 |
4.03 |
100,000 |
3.95 |
250,000 |
1.52 |
500,000 |
1.36 |
750,000 |
1.32 |
1,000,000 |
1.32 |
It should not be surprising that there is a performance penalty when using CALL() vs. in-line code. What is surprising is how the penalty changed as iterations increased. I expected the penalty to be somewhat consistent, or possibly exhibit a slight linear increase as iterations increased, but the results suggest otherwise.
Conclusions
"Pick BASIC removes the traditional run-time overhead of large numbers of calls...".
That's what Rocket asserts and I set out to discover. I knew from the beginning that using CALL() would impose a penalty, but after testing, the real question became, "Does it matter?".
I guess that depends on the nature of your operation and what you're trying to achieve. What's not shown in this article are the execution times of the tests. Without considering those times, you're only getting part of the entire picture.
For Example, if it takes 46 seconds to run 1,000,000 iterations using inline code, it will then take 4 seconds longer (46 x 1.10 = 50.6 seconds) to use an external call. Is that significant enough to justify dropping the convenience of CALL() to in-line the code and save those seconds? For me, the answer is no.
So, in defense of Rocket's assertion about CALL(), I would say that I agree with them. For the vast majority of processing, the convenience of using CALL() to keep your code modular and DRY , is going to far outweigh the performance gains by in-lining.
Two Final Notes.
Flash-compile your code. It's significantly faster than non-Flashed code.
The difference between in-line calculations and executing them via GOSUB is not significant enough to even consider dispensing with the convenience of GOSUB. (60ms differential on 1,000,000 iterations).
The Code
Download the code from the FOSS4MV repository at Bitbucket.