Homework 1, The Sorting Test.

Short Description:

Design and implement an experimental framework that allows you to test the performance of sorting functions. Use this framework to test the standard library's sort and stable_sort functions. In addition use the framework to compare the performance of the vector and the list containers when sorting.

Goals

When you finish this homework, you should have:

Formal Description

Using the tools and methods discussed in class, design and implement an experimental framework to test the performance of the standard library's sort and stable_sort functions.

We will learn more about what you might test as the semester goes along, but for now it would be good to have an empirical measure of the average number of swaps, comparisons and run time for these two sorts. To do this you should

  1. Generate a random set of data.
  2. Sort this vector.
    1. Find the amount of time it took to sort the data, but time for sort only.
    2. Count the number of comparisons done by the sort.
    3. Count the number of times the assignment operator was used by the sort.

Explicitly, you should use the sort and stable_sort algorithms from the standard library. In addition you should use the vector and list classes to store your data. More on that later. I also expect you to use the chrono and random libraries discussed in class. You may use my tools, either directly, or modify them to suit your tastes. You should be familiar with the list and vector classes. We will discuss random and chrono libraries. If you need help with the sort algorithms, please ask.

When generating your data, make sure that there are not many duplicates. Generating 10,000 integer values between 1 and 10 will yield greatly different results that generation 10,000 integer values between 1 and 1,000,000,000. Please do the latter.

There are several approaches I can think of to do part 2b and 2c above. A possible solution is to build a class which holds an integer data value with overloaded less than (<) and assignment (=) operator. Inside each of these, increment a static class variable representing the total number of times each overloaded function was called. Provide routines in the class to read and reset each of these. Instead of sorting integers, sort instances of this class with random data values. If you are not familiar with overloaded operators or static class members, ask. (Or go to Gregorie, your book from 330)

Since a single sample is probably not representative, you should repeat this process a number of times, say 20 to 50, and use these to find the "average" value for all three measurements.

Because we are interested in how the sort behaves as the number of data items increases, you should conduct the above experiment for a number of different data sizes.

Determining the data sizes is a matter of choice. You want enough so that the time spent in the sort is measurable, but no so large that it will take days to run your experiment. I would use a series of sizes that are a multiple of two, say starting at $2^5$ and finishing at $2^{10}$. This is a guess as I type the assignment. You should experiment and find a good size for you, the implementations you are testing and the platform on which you are running your experiment.

You should make your framework as modular as possible as you will be performing this analysis on other sorting functions in the future.

For this test only, compare the differences in the sorting implementations when they are asked to sort an array vs when they are asked to sort a list.

Once you have collected your data, analyze it. This can be done in a number of ways. I like using spread sheets, but if you know R, SPSS, PSPP, data analysis in Python or any other techniques feel free to use them. I am not looking for any deep analysis, but I would like to know why there are two different sorting algorithms, and I would like your experimentation to collaborate the theory we discuss in class. Is there a difference between lists and vectors?

Type your results up in a report. This might follow the traditional IMRaD format (Introduction, Methods, Analysis and Discussion). This is not a research paper, so you don't need a full lit search, but this is the "traditional" format for a scientific paper. There is a reasonable discussion of this format here. Please be sure to include your data in a table. In addition, please present, and interpret, a series of graphs to explain your data.

Other discussion

Required Files

A single tar or zip file containing the source code, a makefile for this program, any scripts used to collect data. The tool you used to analyze your data, and a final report. Please produce your report in word.

Submission

Submit the assignment to the D2L folder Homework 1 by the due date.