C++ Data Structure

C++ Data Structure

This assignment involves the writing of 2 CPP’s. To it’s clear which CPP goes with which “part” of the assignment, make sure that you have some reference to the part number in your file names (like part1.cpp or timingStudy.1.cpp for example).

INPUT FOR BOTH PARTS

An input file of 1,000,000 randomly generated names (one per line) is posted in this module. There are a few different versions of the file and it will serve as the input to both parts of the assignment. Choose the one that best suits you:

  • Windows users: names_windows.txt
  • Mac or Linux users: names.txt (this has the ^M’s removed)
  • If you use the Cloud9 IDE, there is an upload limit which may prevent you from uploading large files. Use this compressed version, which you will need to unzip after uploading: names.zip

PART 1, PERFORM A SIMPLE TIMING STUDY

In this lab you will apply the “4-cycle timing code” posted to Canvas in this module: timingtests.cpp. The purpose is to prepare you for making “big oh” determinations and confirming them with timing studies.

The provided code reads a predetermined number of lines from the data file into a vector. This can be changed by changing the constant NLINES at the top of the file. You probably won’t need to change this, but keep in mind that the data file itself contains only one million lines so NLINES cannot be larger than that.

Write a C++ console app, applying the timing test code to an operation that requires scanning the entire array. Here are some examples. Try writing a loop that:

  • Finds the shortest name in the array
  • Finds the longest name in the array
  • Counts how many times a particular first name occurs, like “Victoria”

To perform the timing tests, you will set the starting size with the variable n at the top of main(), which controls how many lines your algorithm will use for input. The string bigOh also represents your “guess” about your algorithm.

Start with n = 50,000 for the first cycle, and set bigOh to O(n) — that is, we expect that the time it takes to read the file is directly proportional to the number of lines read from the file.

In each of the 4 timing cycles:

  1. Start the timer (with clock()).
  2. Run your loop that processes n lines from the vector.
  3. Stop the timer.

It is possible that caching may throw off the timing for the first cycle, so run your program more than once to see it work correctly. Your timings will not be an exact match for the ‘expected’ amounts each time, but should be close.

Example Output

0.0002 (expected O(n)) for n=50000
0.0004 (expected 0.0004) for n=100000
0.0008 (expected 0.0007) for n=200000
0.0015 (expected 0.0015) for n=400000

As n increases each time through the loop, you should see the timing increase proportionally, i.e. doubling the input should double the processing time.

If you’re seeing numbers that don’t make sense, your computer may be too fast! Try increasing NLINES and the initial ‘n’ to a larger number.

PART 2, BENCHMARK A NESTED LOOP SORT

In the second part you will make your own determination for the big oh of nested for-loop sorting of an array, and confirm your determination.

The nested for()-loop sort should use to this algorithm (with n as the number of strings):

for (int i = 0; i < n; i++)
    for (int j = i + 1; j < n; j++)
      if (a[j] < a[i])
        swap(a[i], a[j]);

Write a C++ console app, applying the same timing test code from the module, to sort the names in your vector in ascending order. You decide what you expect the “big oh” to be, and what n to use for the first cycle.

Write your app to do the following:

  1. Start the timer, perform the nested for()-loop sort, and stop the timer.
  2. Write code to verify that each string in the array is greater or equal to the string before it, starting with the 2nd string. Use assert, like this:
for (int i = 1; i < n; i++)
    assert (a[i - 1] <= a[i]);

To use assert() you may need to include the header file <cassert>. Assertions are used to abort the program if some unknown condition occurs, and are mainly enabled in “debug” or “developer” builds of the code, where the programmer wants to catch abnormal conditions as soon as possible. In “production” code (deployed into the real world), assertions are usually removed.

  1. Output the results, which will look similar to part 1 but with different numbers.

SAMPLE ASSIGNMENT

Sample-2

Powered by WordPress