Average and standard deviation
While working on the climate data processing programs I needed a tool to quickly calculate average and sigma of
large sets of data in ascii format. Using my (t)rusty old HP 11C is not an option. One error while typing a number
and you have to start all over.
So I wrote a small program in Modula-2: avgstd.mod Below is the source.
Avgstd.mod
MODULE avgstd;
IMPORT ASCII, Arguments, TextIO, InOut, MathLib, Strings;
VAR inf : TextIO.File;
fileName : ARRAY [0..63] OF CHAR;
x, sx, sx2 : REAL;
n : CARDINAL;
ch : CHAR;
PROCEDURE init;
VAR count : SHORTCARD;
buffer : Arguments.ArgTable;
BEGIN
Arguments.GetArgs (count, buffer);
IF count = 1 THEN
InOut.WriteString ("Please specify a file to read");
InOut.WriteLn;
HALT
END;
Strings.Assign (fileName, buffer^ [1]^);
TextIO.OpenInput (inf, fileName);
IF NOT TextIO.Done () THEN
InOut.WriteString ("Could not open file ");
InOut.WriteString (fileName);
InOut.WriteString (", aborting now");
InOut.WriteLn;
HALT
END;
n := 0;
sx := 0.0;
sx2 := 0.0
END init;
BEGIN
init;
LOOP
TextIO.GetChar (inf, ch);
IF ch = '#' THEN
REPEAT TextIO.GetChar (inf, ch) UNTIL ch = ASCII.LF
END;
TextIO.GetReal (inf, x);
IF NOT TextIO.Done () THEN EXIT END;
INC (n);
sx := sx + x;
sx2 := sx2 + x * x
END;
TextIO.Close (inf);
InOut.WriteString ("Samples = "); InOut.WriteCard (n, 5); InOut.WriteLn;
InOut.WriteString ("Average = ");
InOut.WriteReal (sx/MathLib.real (n), 10, -3);
InOut.WriteLn;
InOut.WriteString ("Std dev = ");
InOut.WriteReal (MathLib.sqrt ((sx2 * MathLib.real (n) - sx * sx)/MathLib.real (n * (n - 1))), 10, -3);
InOut.WriteLn;
InOut.WriteLn
END avgstd.
It's a small source. The algorithm may be interesting since it is single pass data processing. The formula is in
the picture above.
Data file format
The input file format is easy:
# commentaar 1 2.0 .474 3.0 4.0 5.0 6.0
See it run
If you feed the above datafile into avgstd, you get this:
jan@oxygen ~/Modula/klim$ ./avgstd test Samples = 7 Average = 3.068E0 Std dev = 2.055E0But you might as well use:
jan@oxygen ~/Modula/klim$ cat test | ./avgstd - Samples = 7 Average = 3.068E0 Std dev = 2.055E0