How fast should your scripts run? If you read the Lua literature or follow user groups, you'll often read about Lua's remarkable speed. The Script library is designed so that most numerically intensive processing uses Mira's core library of highly optimized array functions. However, there are times when you need to do intensive numeric processing inside the script itself, as when working with large tables of data, or processing millions of values using your own script code. So you aren't left in the dark about what "fast" means, here are some benchmarks we've measured at Mirametrics. The table below suggests what to expect from your numerically intensive scripts. Many benchmarks list the script source code used. A description of the test procedures and test machine is given at the bottom.
NOTE: Some of these capabilities are available only in MX Script.
| Benchmark | Time (sec) | Speed |
| Create a 64-bit real image from an array of 1 million elements using I:Set(t). | 0.0791 | 12.7 million elements / sec |
| Create a 64-bit real image from an array of 250,000 elements using I:Set(t). | 0.0181 | 13.8 million elements / sec |
| Create a 64-bit real image from an array of 10,000 elements using I:Set(t). | 0.000766 | 13.1 million elements / sec |
| Set 1 million elements in a lua table using t={}; for k=1,1000000 do t[k]=k end | 0.220 | 4.54 million elements / sec |
| Set 10 million table elements in a
global table using t={}; for k=1,10000000 do t[k]=k end | 2.53 | 3.95 million elements / sec |
| Create and set 1 million elements in a
local table using local t={}; for k=1,10000000 do t[k]=k end | 0.096 | 10.4 million elements / sec |
| Perform 10 million multiply's using
local values: local n=0; local m=0; for i=1,10000000 do k=n*m end | 0.314 | 31.9 million multiply's / sec |
|
Perform 10 million divides using local values: local n=0; local m=0; for i=1,10000000 do k=n/m end | 0.323 | 31.0 million divides's / sec |
| Perform 10 million adds using local values: local m=0; local n=0; for i=1,10000000 do k=n+m end | 0.316 | 31.6 million adds / sec |
| Perform 10 million additions using global values: n=0; m=0; for i=1,10000000 do k=n+1000 end | 0.898 | 11.1 million adds / sec |
|
Perform 10 million additions using global values: k=0; for i=1,10000000 do k=k+1 end | 0.809 | 12.4 million adds / sec |
| Perform 10 million empty loops: k=0; for k=1,100000000 do end | 0.177 | 56.5 million loops / sec |
|
Perform 10 million divides and save in a local array: local t={}; local m=3; for i=1,10000000 do t[k]=k/m end | 1.46 | 6.85 million / sec |
| Least squares solution of 100 points with 4 parameters and 3 variables using a "hyperplane" basis function declared in the script | 0.042 | 24 fits / sec |
| Least squares solution of 100 points with 4 parameters and 3 variables using internal "hyperplane" basis function | 0.000556 | 1,800 fits / sec |
| Least squares solution of 10 points with 4 parameters and 3 variables using internal "hyperplane" basis function | 0.000055 | 18,000 fits / sec |
| Least squares solution of 1000 points using a 3x2 (6 term) 2-D polynomial. | 0.0008 | 1,250 fits / sec |
| Least squares solution using CLsqFit class to fit 10 points with a 3x2 (6 parameter) 2-D polynomial. | 0.000041 | 24,400 fits / sec |
| Least squares solution using a 6 term polynomial to
fit 1000 points.
This example uses the simple global function TFit, although
greater versatility is available using the CLsqFit
class. The data are contained in an array t. This function returns
up to 4
results: an array of coefficients, array of errors, the fit standard deviation, and
the sample mean: t = {} c,e,s,m = TFit(t,6) | 0.0011 | 900 fits /sec |
| Create 1 million uniformly distributed random numbers. t = TRand(1000000) | 0.243 | 4.1 million numbers / sec |
| Create 1 million Gaussian distributed
random numbers. t = TGaussDev(1000000) | 1.16 | 862,000 numbers / sec |
| Histogram of 1 million real numbers using 100 bins.
Adopting other than default parameters requires using class methods, as
shown here: H = NewHist() H:SetBins(100) H:Calc() | 0.167 | 6 million numbers / sec |
|
Histogram of 1 million real numbers, pre-sorted. This uses the global
THist fuuction, which is the function that is
benchmarked: t = TRand(1000000) TSort(t) THist(t) | 0.093 | 11 million numbers / sec |
| Add two 1000x1000 64-bit real images | 0.00425 | 236 images / sec |
| Add two 1000x1000 32-bit real images | 0.00198 | 500 images / sec |
| Add two 1000x1000 16-bit integer images | 0.00142 | 704 images / sec |
| Add two 1000x1000 24-bit RGB images | 0.00414 | 242 images / sec |
| Add two 1200x960 48-bit URGB images | 0.00444 | 225 images / sec |
| Multiply 1000x1000 32-bit real images | 0.0033 | 300 images / sec |
| Multiply 1000x1000 32-bit real image by a number | 0.00475 | 210 images / sec |
| Divide two 1000x1000 32-bit real images | 0.0094 | 106 images sec |
|
Start with image I[1] which is a 1000x1000 pixel 16-bit image.
Convert it to "float" data type and then do various arithmetic operations on
it: I[1]:SetDatatype("float") I[2] = I[1] + 1000 I[3] = I[1] / I[2] I[4] = I[1] ^ I[3]. All 4 images use 32-bit real pixel type. This process involved creation of 4, 4MB images as well as the image mathematical operations between them. The last operation raised I[1] to the power of I[3], pixel by pixel (a very CPU-intensive computation). The benchmark includes all 4 steps. | 0.097 | 10.3 million pixels / sec |
| Same as above plus display all 4 final images in a new image window. This includes computation of an image histogram, transfer function, and palette mapping for each image. | 0.369 | 2.7 images / sec |
| Load 1 Megapixel image of 16 bit pixels from hard drive, compute complete image histogram and auto-scale transfer function using gamma=0.6, then display in a new window. | 0.125 | 8 images / sec |
Conclusions
If there are any major results to be gleaned from the table above, they are as follows:
- Mira's use of Lua provides a high-performance scripting language.
- Declare local values whenever possible. This has 2 advantages: speed when used many tmes in a loop, and it prevents the global namespace from being polluted by different values with conflicting names.
Testing Procedure and the Test Machine
test machine used was chosen to be representative of a typical "fast" machine in use by Mira users. This machine uses a 3.0 GHz Pentium Core-2 Duo E-6850 CPU with a 1333 MHz front-side bus and 4 GB of 800 MHz DDR-2 RAM. The operating system was Windows XP/SP3. Screen applications that also were open during these tests included Visual Studio 2008, Calculator, Outlook, Windows Explorer, and Mira MX Ultimate Edition. To increase the significance of the benchmarks, most procedures were repeated in a loop of 10 to 10000 cycles and the time value was divided accordingly. Each timing was then repeated 3 to 10 times and the typical value, rather than the lowest value, was adopted as the benchmark.