STA 247 ASSIGNMENT 2 - DISCUSSION

The output shows the results from four simulation runs, with different
random number seeds.  

As can be seen, the scheme with step=20 works much better than the
scheme with step=1.  The average number of probes need with step=20 
is about half that needed with step=1.  The probability of needing
more than 25 probes is about 0.1 with step=1, but only about 0.005
with step=20, which is an even bigger difference.  Using step=20 
seems to be better in all circumstances, but especially so if you 
are concerned about some lookups taking a very long time.

One can see this difference in the histograms as well.  The histograms
for the 50 means shows that much of the variation in number of probes
when step=1 is due to the different way the table is set up with the
160 keys, since the means for the 200 lookups for a table are much
more variable when step=1.  

One can get an idea about the accuracy of the estimates by comparing
the four simulation runs.  Using just one of these runs, one can
estimate how accurate the estimates are by the sample standard
deviation of INDEPENDENT values that are averaged together divided by
the square root of the number of values.  For this problem, the
numbers of probes for all 10000 lookups are NOT independent, but the
mean numbers of probes for the 50 tables created ARE independent.  So
we can get an idea of accuracy by considering the standard deviations
of averages over these 50 tables.  These estimates are themselves
variable, however, as can be seen by looking at the four replications.