MATH 1350 Calculus I - Lab 2 - Fitting Functions to Data

The question of modeling empirical data is a very broad subject that involves finding an equation that best describes data that was obtained experimentally. The quality of our model is to some extent judged by its ability to make accurate predictions. Study in this area ranges from looking at methods which generate the best model for a given set of data to looking at what constitutes a good set of data, i.e., does the data allow us to construct a model which can make accurate predictions? It would be impossible to present an introductory treatment of this area in one lab. Instead, we will look at how we can use Maple to model a set of data with a polynomial or exponential function. The criteria that Maple uses to determine a "best fit" function is the method of least squares.

In principle, the method of least squares works by trying to minimize the error between the predicted values of the model and the actual data. In words, we are attempting to find a function such that the sum of the squares of the distances between the actual and predicted values is minimized. It may be that all of the data points actually lie on the graph of our model or it may be that none the points actually lie on the graph. The reason that we try to minimize the squares of the individual errors and not just the errors is to avoid the possibility that the errors cancel out, giving us a zero net error when our model may not be very accurate. It should be obvious that when we look at the quantity (f(x)-y)2, where f(x) is the output of our model at x and y is the actual data point, we can write the terms in either order without effecting the result.

Maple has a built-in procedure which allows us to find a polynomial which best fits (in the least squares sense) a given data set. This means that we can employ Maple to find a linear, quadratic, or cubic function which models a set of data. However, if we wish to use an exponential model, it may appear that we are stuck. The least squares procedure only allows us to specify polynomials when fitting data. It turns out that this is only a minor inconvenience; we have the ability to work around this limitation. What we do is linearize our data by transforming our variables. Suppose the data is related according to y=Aekt
for some constants A and k. It can be shown that by applying the natural log to both sides of the equation, we obtain the relationship Y = kT + b where T and Y are our transformed variables, Y = ln y and b = lnA. Maple can then be used to find the linear function which best models the transformed data. Once Maple has determined the constants k and b, we can then transform these values back to the A and k we need for our exponential model. For example, suppose Maple determined that the best linear fit to our transformed data was given by the function Y = 1.423T +0 .738. Since b = 0.738 = lnA, we have that A=e0.738 which is approximately 2.09. So the exponential model which best fits the original data is y=2.09e1.423t .

Before we employ Maple to determine our best fit models, we first must cover some of the basic commands we will have to use. Typically the data we will be modeling will be given in a table and so we will need to employ the Maple "list" structure (see Lab 0 for a brief introduction to this structure). As a consequence, it will be necessary that we become familiar with some of the Maple commands that operate on lists. An introductory section on some these commands comprise Part I of this lab. You should also be familiar with the "plot" command and, in particular, the optional argument, "style=point", that can be specified when using "plot".

Part II: Operations on Lists

Suppose we were given a list containing 5000 data values and we needed to calculate the square root of each value. We could ask Maple to calculate the values as we went through the list entry by entry. This would be a very tedious task. Or, with some limited knowledge about programming, we could write a short program which automated this task . The second approach would be highly preferable to the first, but we would have to be familiar with Maple programming. There is a third approach which is available to us. Maple has built-in procedures which perform operations on lists. While care must be taken when using these procedures, they do allows us to "automate" actions on the elements of a list without requiring that we know how to program. Three useful procedures are the "zip" function, the "map" function, and the "seq" function. We will use the first two of these procedures in the examples that follow.

Suppose we were trying to model the data presented in the following table:

Our goal will be to determine a function which best fits this data. The Maple routine that does this for us requires that we define two separate data lists: one list defining the values taken by the independent variable (the input variable) and the other defining the values of the dependent variable (the output variable). Defining the values in separate lists is to our advantage for other reasons as well. We may need to perform an operation on just one set of values - such as taking the natural log of the y values, for example. By assigning each list a name, we are able to access these values at a later time.

> xdata:=[5.2,5.3,5.4,5.5,5.6];

> ydata:=[27.8,29.2,30.6,32.0,33.4];

If we desired a plot of these points, we could merge the x and y values into one list and then tell Maple to plot this new list. Fortunately, the "zip" command does this for us.

> xydata:=zip((x,y)->[x,y],xdata,ydata);

The new list named "xydata" contains the ordered pairs (x,y) where the xdata list forms the first coordinate and the ydata list forms the second coordinate. We can ask Maple to plot this list:

> plot(xydata,x=5..6,style=point);

Now suppose we are trying to find an exponential function describing this data. We first need to transform the y values before we could ask Maple to find the best linear fit between the transformed variables. This linearization is accomplished by using the "map" command. We define a new list which we name, "ylogdata", which contains the transformed values. The first entry after the map command defines the operation we wish to apply to the elements of the list specified in the second entry. In this case, we wish to apply the natural log to the values in ydata: (Note that "ylogdata" is just a name for our new list.)

> ylogdata:=map(ln,ydata);

Now that we have covered how to enter data, it is time to look at the data modeling process.

Part III: Fitting Functions to Data Using Maple and The Least Squares Method.

There are so many commands in Maple that the program doesn't want to open every command every time you start Maple. In order that the computer can move more quickly, it stores some of its funtions in packages. Its kind of like putting your winter clothes away during the summer so they're not in your way. The best fit function is one of those functions that Maple hides away in a package called "stats." So, in order for us to use the function we'll have to open the stats package. Its like having to open the box to get to your favorite sweater. So, for Maple to load the function that is in the stats package we tell it to go look for it with the following command- this command must be typed at the beginning of every lab in which you hope to use the best- fit function.

> with(stats);

Now that Maple knows of the fit procedure, we can use it to find the best fit line for the original data and the best fit for the transformed data. We'll start by finding a linear model for the original data. We enter the following command. Note that the list containing the values for the independent variable (the input variable) precedes the list for the dependent variable (the output variable).

> fit[leastsquare[[x,y]]]([xdata,ydata]);

The Maple output describes y as a linear function of x. However, we need to perform one more step before we can use this output. We must convert it into a function. We will call this function lin_fit.

> lin_fit:=unapply(rhs(%),x);

Note that what this command says is to create a function that maps x to the right hand side of the previous line ("rhs(%)"). By typing lin_fit to the left of the ":=" we have named the function "lin_fit". Now that we have defined lin_fit, we can plot or use it to make predictions. For example, if we wanted to find the predicted value of y when x = 6, we could simply type in:

> lin_fit(6);

Now to plot our data points and our model at the same time we will learn a new plot method designed to show two seperate plots on the same set of axes. This plot is more advanced so we'll have to start by pulling up the package containing advanced plots. (Note that by using a colon instead of a semicolon after the command the computer will pull up the package but won't show you the contents.) After that we'll define two plots "d1" and "d2" just as we would normally define them except we'll end the command with a colon instead of a semicolon. This tells the computer to make the assignment for d1 and d2 but not to print the results. On the last line we tell the computer to show us both the plots simultaneously.

> with(plots):

> d1:=plot(xydata,x=5..6,style=point):

> d2:=plot({lin_fit(x)},x=5..6):

> display([d1,d2]);

The computer is now showing both our original data points and the line that our model predicts other data would fall on.  Note that in this case, the model seems to match the data points almost exactly.

The next thing we'd like to do is find an exponential model for our data.  To do this we follow the same steps as above but now we use the transformed data we converted using the natural log.  So, instead of using the ydata, we use the ylogdata.   The only difference in our calculations is that we must handle the result of the least squares a little differently:

> fit[leastsquare[[x,y]]]([xdata,ylogdata]);

> exp_fit:=exp(rhs(%));

Before going any further, a comment is in order about the exp command. Maple doesn't recognize the number e that we think of as approximately 2.71828. In order to get Maple to recognize this irrational number we type exp followed by a set of parentheses. Whatever is inside the parentheses will be the exponent attached to e. So, if we just want plain old e we type exp(1) to tell the computer to raise e to the first power.

Now, let's look at what Maple has done for us.   In the first command, it gave us a linear model to correspond with the transformed data.  In the second line, we ask Maple to transform the linear model to an exponential model.  Since we've gone back and forth with our data, this new model represents an exponential model for the original set of data.

Although "exp_fit" does not look exactly like the model we want to use, it really is. We can convert it to a more recognizable form. We'll just ask the computer to exaluate e.9424238310 and display our model with this new information.  This step is for our benefit only, this expression will work just fine without the conversion.

> exp_fit:=expand(exp_fit);

One final note: this is important!, exp_fit is defined as an expression and not as a function. We cannot treat it as a function. We are still able to plot it, but if we want to make a prediction with this expression, we must do things differently than before. For example, if we want to predict y for x = 6, we must first assign the value of 6 to x and then look at exp_fit:

> x:=6;

> exp_fit;

After we find a specific value for "exp_fit" we can plot the expression much like before. Note that we must first make the computer see x as a variable again and not as the constant 6.

> unassign('x');

> d3:=plot({exp_fit(x)},x=5..6):

> display([d1,d3]);

Exercises:

1. Show that an exponential relationship y=Aekt   can be transformed into the linear relationship Y=kT+b by applying natural logs. In this case, how do the quantities Y, T and b relate to y, t and A? This process was described at the beginning of the lab, you just need to write out the mathematical details.  This is something you can do on your own paper- you don't need the computer for this.

2. Find a best fit linear function and a best fit exponential function for the following set of data. For each function provide a graph containing both the data points and the best fit function.