Homework 2, Statistics.
Short Description:
Write a program which will perform elementary statistical analysis on a file of numbers.
This assignment is worth 100 points.
Goals
When you finish this homework, you should:
- Have declared and manipulated one dimensional arrays.
- Have passed one dimensional arrays as parameters to routines.
- Have written simple sorting and searching routines.
Formal Description
Write a program which, given a data file, will compute basic statistics on the data in the file. The program will also support some data query functions.
The program must compute the following measures (for valid data):
- Number of valid data items.
- Maximum and minimum value of the data.
- Midrange of the data
- Median (or middle number)
- Mean (1/n × Σ xi)
- Mode (number that occurs most frequently).
- Range (maximum - minimum)
- Standard Deviation (sqrt(1/n × Σ(x-μ)2))
The program must also support the following interactive actions
- Given a number, find the position of this number within the original data and within the ordered data.
- Given a number, determine how frequently this number occurs in the data.
- Print the original data list
- Print an ordered list of the data.
- Given a number of bins, print a histogram of the data.
Input
The program should begin by asking the user for the name of an input file.
The input file contains at most 10,000 valid integer numbers. A valid number
starts with a digit or a '-' and contains only the digits 0 through 9.
The file may contain any number of invalid numbers. Invalid numbers contain digits as well as characters.
Entries in the file are separated by white space.
Consider the following data file
1 3.2 4 a5 6c
7
3
-2 8-
This file contains the following valid data: 1, 4, 7, 3, -2
This file contains the following invalid data: 3.2, a5, 6c, 8-
Output
Your program should begin by printing the required statistics, clearly labeled, in the order given. After this, your program should present the user with a
list of choices of possible other queries. After a query is selected, the program should prompt for additional input required, process the query, display the results and represent the menu.
A histogram can be displayed using the following technique:
- Find the number of potentially different numbers in the data (high-low+1)
- Divide these numbers into a set of bins. ((high-low+1)/bins)
- Count the number of numbers in each bin.
- Normalize each bin by dividing by the total number of numbers.
- Scale each bin by 20, rounding up.
- Draw the hist gram, labeling line 1, 5, 10, 15, and 20 on the y axis.
- Label each bin on the x axis.
- Print the bin ranges below the histogram.
Consider the following:
data = 2 3 5 5 6 8 8 8 8 10
Number of bins = 3
Different numbers = 10-2+1 = 9
Bins are 2 to 4, 5 to 7 and 8 to 10
Bin 1 contains 2 numbers (2,3)
Bin 2 contains 3 numbers (5,5,6)
Bin 3 contains 5 numbers (8,8,8,8,10)
Normalized
Bin 1: 2/10 = .2 x 20 = 4
Bin 2: 3/10 = .3 x 20 = 6
Bin 3: 5/10 = .5 x 20 = 10
The y axis labels
Since there are 10 numbers total, each * in the
histogram represents 1/2 a number.
Line 20: 10 numbers
Line 15: 7.5 numbers
Line 10: 5 numbers
Line 5: 2.4 numbers
Line 1: .5 number
Output
10 |
|
|
|
|
7.5 |
|
|
|
|
5 | *
| *
| * *
| * *
| * *
2.5 | * *
| * * *
| * * *
| * * *
.5 | * * *
+-------
Bin # 1 2 3
Bin 1 : 2 to 4
Bin 2 : 5 to 7
Bin 3 : 8 to 10
When a dataset is displayed, it should be broken into a number of lines, each
line no more than 80 characters wide. Numbers should not be split in the middle.
Discussion
Your program should be modular in design. You should employ simple routines
which are well documented.
You should begin working on the parts of this program you can accomplish now, reading in data, printing out data.
Your sorting and searching routines should be contained in their own files. You should provide .h files to accompany these files.
Required Files
Source code, a Makefile which builds the entire project, and a README file which provides project documentation.
Submission
Email a tar file containing all required files to dbennett@edinboro.edu by October 6 at class time.