next up previous
Next: About this document ...

Problem set #2

The following problems require you to compute least squares fits and linear interpolants. Many software packages will perform basic operations for you. Maple and Matlab are examples, but even simple plotting packages often have ``linear regression'' subroutines that will find the best fit for you.

1.
Download dataset A from the course web site. Find an interpolating polynomial through this dataset, and record the coefficients. Find the best fit of this data for linear, quadratic, cubic, fourth and fifth order polynomials and record the root mean squared errors. Assuming the data came from a systematic source, f(x), with random noise added, speculate the form of the underlying system. Justify your answer.



  
Figure 1: Best fig polynomials and RMS errors for dataset A. The interpolating polynomial is the fifth order best fit because there are only six data points.
\resizebox{3in}{!}{\includegraphics{prob1.eps}}

The solutions are shown in Figure 1. Notice that the cubic, quartic and quintic polynomials all perform reasonably well. The quadratic and linear look noticeably bad, and there is a rapid plunge in the RMS errors when going from quadratic to cubic. Of course, if f(x) were cubic, the quartic and quintic fits would look good too. Thus, one suspects that the f(x) might be cubic and that subsequent improvement might be because one is fitting the curve to the noise. These are the hazards of having a small sample. Investigators who fall into this trap tend to find that they can always fit their model to the data.


2.
Download dataset B from the course web site. This is a second set of measurements from the same source. Find an interpolating polynomial through these points and compare with the interpolating polynomial from dataset A. Are the interpolants consistent in any way?



  
Figure 2: These are the interpolating polynomials for dataset A compared to dataset B.
\resizebox{3in}{!}{\includegraphics{prob2.eps}}

In Figure 2, we see that they are not very close though several students noticed that the max's and min's occur at similar points. Thus, we see the interpolation can be rather erratic, especially with a paucity of data. Notice too that the ``bad'' behavior occurs where the gaps in the data is greatest.


3.
Download dataset C from the course web site. This is a third set of measurements from the same source. Find the best fit of this data for linear, quadratic, cubic, fourth and fifth order polynomials and record the root mean squared errors. Based solely on the information from this third dataset, speculate on the form of the underlying system. Compare these results with those from dataset A. Justify your answer.



  
Figure 3: Best fit polynomials for dataset C.
\resizebox{3in}{!}{\includegraphics{prob3.eps}}

In Figure 3, we see that we have a large sample. When examining the RMS errors, we find that there is a drop between quadratic and cubic. The steady RMS errors beyond this point can be attributed to the noise. If we use very high order polynomials so that the number of coefficients was close to the sample size, we would find that we would start fitting the curve to the noise again.


4.
Assuming that both datasets came from the same source, what is your best guess of the underlying system, f(x)? Justify your answer.


The information from questions #1 and 3 all point to cubic form. I wrote the problem, so my opinion is irrelevant.


5.
The following is a true story. My neighbor, Mrs. X [not her real name, of course], approached me for some help in a lawsuit which she had brought against her former employers. She had suffered serious health problems and attributed them to a faulty furnace at her former place of employment. The furnace had cracked and was leaking carbon monoxide (CO) into the building. Continual exposure to carbon monoxide can cause a wide variety of health problems. An environmental consulting firm hired by the defendants measured the extent of the leak in the following way. They turned off the furnace and allowed the building to vent for many hours.
 
Table 1: Tables of first (left) and second (right) carbon monoxide measurements.
$\textstyle \parbox{3in}{\begin{tabular}{\vert l\vert l\vert}
\hline Time (min) ...
...line
60 & 19.1 \\ \hline
70 & 19.8 \\ \hline
80 & 20.5 \\ \hline
\end{tabular}}$ $\textstyle \parbox{3in}{\begin{tabular}{\vert l\vert l\vert}
\hline Time (min) ...
...line
50 & 21.9 \\ \hline
60 & 22.2 \\ \hline
70 & 22.3 \\ \hline
\end{tabular}}$

Then, they turned on the furnace and measured the carbon monoxide levels at a duct to a common area at regular time intervals (see Table 1). After this test, they turned off the furnace, waited a few hours, and ran a second test (see Table 1 again). I ask you the same questions that my neighbor asked me at the time.

In addition, I ask you the following question: Assuming the employer repaired the furnace after the tests so that no further measurements were possible, what additional information would you want or need to have to confidently answer the questions above?


The whole point behind modeling is to gain some understanding of the problem and quantify it. In this case, one would learn something about diffusion in a duct and perhaps furnaces and see if one could build a general solution. Then, one would try to determine any unknown coefficients by fitting this solution to the known data.




 
next up previous
Next: About this document ...
Louis F Rossi
2001-10-17