Standard Tests

Topics

  1. Introduction to Standard Tests
  2. Data Format
  3. Defaults
  4. Options
  5. Output
  6. Caveats
  7. Standard Tests Tutorial
  8. Literature Cited


1. Introduction to Standard Tests

Many scientists who are not familiar with null models are well-versed in standard parametric tests, such as ANOVA and regression (Sokal and Rohlf 1995). However, there are randomization tests which are analagous to all of the conventional statistical tests that you use (Edgington 1987). This module allows you to use null models for four of the most common statistical tests. If you are unfamiliar with null models, this is a good place to start learning about them.

In statistical analysis, we make a fundamental distinction betweeen dependent and independent variables, sometimes called response and predictor variables. Dependent variables are those that we are trying to model or explain. Independent variables are those that we think are causal. In other words, we suspect that the independent variable is responsible for variation in the dependent variable. We then test a null hypothesis which typically says that variation in the dependent variable that can be "attributed" to the independent variable is no more than we would expect by chance. Rejecting this null hypothesis means that variation in the dependent variable is associated with variation in the independent variable.

It is the investigator who must designate the dependent and the independent variable and interpret the relationship between them. In experimental studies, we do this by manipulating the independent variable and measuring the response in the dependent variable. In non-experimental studies, we don't carry out a manipulation, but rely on natural variation present in the data (the "natural experiment" Cody 1974). In both experimental and non-experimental studies, it can be tricky to determine the pathways of causality (James and McCulloch 1985, McGuinness 1988). Do predators control prey, do prey control predators, or do both predator and prey exhibit correlated responses to a third variable? It is the job of the experimentalist and the modeler to articulate the alternative hypotheses and then conduct experiments or gather data to distinguish among them (Hilborn and Mangel 1997, Bernardo and Resetarits 1998).

A second useful distinction in our data is that of continuous and discrete variables. Continuous variables are those that can be measured on a continuous scale, such as height, weight, or area. Discrete variables are those that allow us to classify individuals or subjects into well-defined categories, such as sex, species, or life history stage. The distinction between continuous and discrete variables isn't always clearcut. For example, many continuous variables, such as size, can be reclassified as discrete variables with categories such as juveniles and adults. Conversely, discrete variables, such as habitat type, can sometimes be measured on a continuous scale, such as light penetration. The choice of discrete or continuous variables is dictated by the biology of the system and the measurements that are available to you.

Dividing the world up into discrete and continuous variables that are either independent or dependent give us four categories for data analysis:

Dependent Variable
ContinuousDiscrete
Independent VariableContinuousRegressionRuns Test
DiscreteANOVAChi-square

Here is an overview of the kinds of data analyses that fall in each category.

Regression

This category includes analyses in which both the independent and the predictor variables are continuous. Such data are graphed in a scatter plot, with the independent variable traditionally shown on the x axis, and the dependent variable shown on the y axis. In such a plot, each data point is an observation, which consists of a pair of <x,y> values. The null hypothesis is that variation in the y variable is unrelated to variation in the x variable; graphically, the plot of the x and y variables should resemble a scattershot of points.

This category includes familar tests such as correlation and linear regression. Other statistical tests with continuous x and continuous y variables include multiple regression, non-linear regression, path analysis, quantile regression, and some multivariate methods such as principle components analysis.

EcoSim provides a very simple regression test for a continuous x and a continuous y variable. It fits a standard linear regression to your data, and then uses randomization to test the null hypothesis that the slope, intercept, or correlation coefficient equals 0.0.

ANOVA

This category includes analyses in which the dependent variable is continuous, but the independent variable is discrete. Such data are plotted by illustrating the mean (and usually the variance or standard deviation) of the continuous variable for each of the categories of the discrete variable. The null hypothesis is that the variation among the mean of the groups is no greater than expected by chance.

This category includes the familiar one-way ANOVA, in which observations are grouped according to a single classification system. For example, one might measure body size of 10 individuals each of 3 different species of fish. A one-way ANOVA would be used to test the null hypothesis that average body size does not differ among the 3 species. More complex data structures can also be handled with ANOVA, including 2- and 3-way designs, randomized block, split-plot, repeated measures, and nested analyses of variance. Even in these complex models, however, we are still analyzing a continuous dependent variable that is categorized according to one or more discrete independent variables.

EcoSim provides a randomization test for a simple one-way ANOVA. The continuous data are classified into two or more categories. EcoSim reshuffles the data among those categories and then determines how much variation is expected among the means of the different categories.

Chi-square

This category includes analyses in which the both the dependent and the independent variable are discrete. Such data are plotted in a two-way table. Each row in the table represents one of the categories of the independent variable and each column represents one of the categories of the dependent variable. The entries in the table are the number (or percentages) of observations recorded for each category. The null hypothesis is that the response and predictor classifications are independent, so that the proportion of observations in one row of the table is the same as in any other row of the table. We will have more to say later about the null hypothesis for such a test. Other designs in this category include goodness-of-fit tests, and multi-way contingency tables (Fienberg 1980).

EcoSim uses a randomization test for the independence hypothesis in a two-way contingency table. The user provides the observed counts in each category (including zeroes). EcoSim calculates the expected values, randomizes the matrix, and calculates a chi-square deviation statistic for both the observed and simulated data. A unique feature of this module is that the user can change the expected values, and randomize across rows or columns of the matrix, allowing for more sophisticated tests in which expected values are generated from an external hypothesis (such as the 3:1 ratio of dominant to recessive phenotypes in a simple Mendelian cross of heterozygotes).

Runs test

This category includes analyses in which the dependent variable is discrete and the independent variable is continuous. This is the least familiar category of analysis in our table, but it is one that may have special utility for ecologists. For example, the dependent variable might be the presence or absence of a species on an island (discrete), and the independent variable might be the area of the island (continuous). We wish to test the null hypothesis that the occcurrence of the species is independent of the area of the island; such tests amount to an analysis of the incidence function (Diamond 1975) of a species (Whittam and Siegel-Causey 1981, Schoener and Schoener 1983).

A runs test orders the observations by the continuous variable, and then measures the length of the longest "run", that is the number of consecutive observations of the dependent variable with the same classification. In our example, this would be the number of consecutive islands for which the species is present. The null hypothesis is that the presences and absences are randomly interspersed with respect to island area, so that the observed run would be no longer (or shorter) than expected by chance. Other designs in this category include logistic regression, and logit or probit analysis.

EcoSim provides a randomization test for presence-absence data associated with a single, continuous variable. It tests for non-randomness in the length of the longest run (runs test), and in the mean and variance of the continuous variable for occupied sites. The user has the choice of two randomization methods: one in which the data are randomized equiprobably among the sites, and the other in which the probabilities of occurrence are proportional to the continuous variable. The latter is a model in which sites act as "targets", and the probability of randomly hitting the target is proportional to the measured variable (such as island area; see Coleman et al. 1982).

Beginning with version 9.0, we have added two new features to the runs test: first, we have allowed for a continuous response variable, in which multiple "hits" can occur for a single x-value (such as multiple insect visitors to a single plant). We have also allowed for a continuous variable that is measured as angular data on a 0 to 360 scale(such as the compass orientation of an open flower).

2. Data Format

Regression

The input consists of a data matrix in which each row is an observation, and each column is a variable of continuous data. Missing data are not allowed. The user must select the column that corresponds to the x variable and the column that corresponds to the y variable. As in all EcoSim data sets, the very first column gives the row labels, and the very first row gives the column labels.

ANOVA

The input consists of a data matrix in which each row is an observation. As in all EcoSim data sets, the very first column is a set row labels that uniquely identify each observation. In this module, one of the data columns (usually the second) should contain the "labels" for the different categories. These labels identify the groups that the observations are classified into. Labels may be any set of alpha-numeric characters, but must not include any blanks (which EcoSim interprets as a column marker). Be careful when entering labels, because EcoSim will interpret any spelling errors as unique labels. There must be at least two different labels, with at least one observation present for each labelled category. The response variable is a continuous numeric variable contained in one of the other columns of the data matrix. The response variable can include any real number values (including zeroes and negative numbers). EcoSim asks you to specify which variable is the continuous response variable, and which variable is the "label" (the discrete predictor variable).

Chi-square

The input consists of a data matrix in which each row is a category of the independent variable and each column is a category of the dependent variable (or vice-versa; the test does not depend on whether the dependent variable is placed in rows or in columns). Each entry in the data matrix consists of the number of observations in each category. Zeroes are acceptable input, but negative numbers and non-integer values are not. Unlike other EcoSim modules, this analysis uses the entire data table as input, so the user is not asked to select particular columns for analysis. For some Chi-square analyses, the data may consist of a vector: a single row or a single column of data.

The user may optionally provide input in the form of a matrix of expected values. This matrix must have the same dimensions and labels as the original data matrix, but the entries contain the expected values under the null hypothesis. The values in the expected matrix are non-zero real numbers. It is not necessary that these values be integers (as they must in the data matrix), but they cannot include any zeroes. For the default test of two-way independence, EcoSim automatically calculates the expected values based on the marginal totals of the matrix. The user can modify these expected values to test other null hypotheses. The matrix of expected values can be edited by the user and saved to disk. The user can also use the file menu to load a matrix of expected values into the program.

Runs Test

The input consists of a data matrix in which each row is an observation. One column of data represents a presence-absence variable containing only 1s or 0s (or other integer values) as entries. This is the discrete response variable. A second column contains the continuous predictor variable, which must be a positive real number. EcoSim will ask you to specify these two variables for analysis.

3. Defaults

All of the standard tests use 1000 replications as the default and take their random number seed from the system clock. Simulated matrices are not saved to disk. These options are all specified on the General Tab of the input.

Regression

The default analysis uses the first column of data as the x variable and the second column of data as the y variable.

ANOVA

As in the regression analysis, the default ANOVA analysis uses the first column of data as the x variable ("labels") and the second column of data as the y variable.

Chi-square

In the default analysis, EcoSim calculates and displays expected values that are based on the marginal totals in the matrix. For row i column j in the matrix, the expected value for cell ij is:

Expected cell value

where Ri is the row total for row i, Cj is the column total for column j, and N is the grand total of all observations in the matrix. Remember that if you alter values in the observed matrix, EcoSim will not automatically re-calculated the expected matrix. If you want to update the expected matrix, hit the button Reset Defaults, and EcoSim will recalculate the expected values based on the current data set.

The default simulation algorithm for Chi-square is Randomize Matrix. This option distributes the N observations randomly and independently to the cells in the matrix. The probability Pij that cell ij receives one of the observations is:

cell probability

Thus, when the null hypothesis is true, the average number of observations in each cell of the matrix will equal the expected value for that cell calculated by EcoSim.

Runs Test

In the default analysis, EcoSim uses the Equiprobable simulation constraint, and reshuffles the occurrences randomly and equiprobably among the samples. The default sampling algorithm is without replacement, meaning that EcoSim treats the data as presence absence and does not consider the number of occurrences or visits for each observation. The default data type is linear. In the default analysis, EcoSim uses the first column of data as the x variable (continuous) amd the second column of data as the y variable measuring presence-absence (discrete).

4. Options

Regression

No options are available to the user, other than the selection of the columns to be used for the x and y variables.

ANOVA

As in Regression, no options are available to the user other than the selection of the columns to be used for the categories ("labels") and the y variable.

Chi-square

Expected

The expected matrix is important because it is used to calculate the deviation statistic for both the observed and the simulated matrices. The expected matrix and the simulation algorithm together determine the null hypothesis being tested, so you will need to make sure these two are used correctly; see the discussion in "Caveats".

Simulation Options

Three simulation options are available for randomizing the matrix. These should be chosen to match the expected matrix:

1) Randomize matrix This option randomizes the N observations and randomly reallocates them to the cells in the matrix. As explained earlier, the probability of occurrence in a cell is scaled to the marginal values of the matrix. This randomization algorithm should only be used with the default expected values calculated by EcoSim. If you want to use some other expected values, you should either randomize the rows or the columns of the matrix.

2) Randomize Columns This option randomly reshuffles the observed values within each column of the matrix. Observations are assigned randomly and equiprobably to one of the cells within each column.

3) Randomize Rows This option randomly reshuffles the observed values within each row of the matrix. Observations are assigned randomly and equiprobably to one of the cells within each row.

Runs Test

Simulation Constraints

Two null model algorithms are available to you for randomizing the occurrences in your data:

1) Equiprobable In this (default) algorithm, occurrences are randomly and equiprobably reshuffled among different rows of the data.

2) Proportional In this algorithm, occurrences are randomly reshuffled, but the rows are no longer equiprobable. Instead, the probability of occurrence is proportional to the value of the continuous variable for that row.

Sampling Algorithm

The sampling algorithm determines whether the response variable will be treated as discrete presence absence data or as integer counts.

1) Without Replacement With this algorithm, the integer value of the occurrence data is not considered, and the program reshuffles the occurrences. For example, if the data set consisted of 100 samples, and three samples had values of 1, 3, and 6, these 3 occurrences would be randomly reshuffled among the 100 samples.

2) With Replacement With this algorithm, the integer values of the occurrence data are independently reshuffled. In the example above, if the three samples had values of 1, 3, and 6, the program would randomly and independently place 1 + 3 + 6 = 10 occurrences among the samples, wtih the possibility of multiple hits on one or more samples. Although this analysis does contain a total of 10 occurrences, they are not necessarily limited to exactly 3 samples, as in the original data.

Data Type

Data type refers to the continuous variable:

Linear All continuous data except for angular data are of a linear type.

Angular Angular data are data that are measured in compass degrees (0 to 360). Angular data must be treated differently because even simple statistics, such as the mean cannot be calculated in the usual way. For example, if two angles have measures of 2 degrees and 358 degrees, the midpoint between them is 0 degrees. However, if you calculate an ordinary mean, the answer is 180 degrees! EcoSim uses

circular statistics

See Fisher (1993) for more details on circular statistics.

5. Output

Three output tabs are found in all the output of all 4 standard tests:

Input Matrix Tab

The Input tab shows you the original utilization matrix, with all of its labels. You cannot edit the data in this window, but you can refer back to the original data set as you study the simulation results.

Simulated Matrix Tab

The Simulation tab shows you the most recent simulated matrix that was created by EcoSim. By clicking back and forth between the Input and Simulation tabs, you can examine this randomized matrix and convince yourself that EcoSim has randomized the data in the way that you wished. Different randomization algorithms will change the appearance and structure of the simulated matrix. Also note that the contents of the simulated matrix will change each time you run the simulation, unless you have specified a particular random number seed.

Summary Tab

The summary tab gives the simulation conditions, including the name of the input file, the randomization algorithm, measured index, number of iterations, constraints, and random number seed.

The summary window also supplies you with the standardized effect size, which is calculated as:

observed index - mean(simulated indices)/standard deviation(simulated indices)

This metric is analagous to the standardized effect size that is used in meta-analyses (Gurevitch et al. 1992). It scales the results in units of standard deviations, which allows for meaningful comparisons among different tests. Roughly speaking a standardized effect size that is greater than 2 or less than -2 is statistically significant with a tail probability of less than 0.05. However, this is only an approximation, and it assumes that the data are normally distributed, which is often not the case for null model tests. For any individual study, you should always report the actual tail probability, which is calculated directly from the simulation, and does not require any assumptions about normality of the data.

Finally, the summary tab shows the original data matrix, including the row and column labels.

All of these data can be edited, deleted, or annotated. The output can then be saved (Save to File) or discarded (Close). There is also a small time clock in the lower right-hand corner so you can tell how long your simulation took.

Next, we will describe individual output tabs that are unique to each of the four standard tests:

Regression

Charts Tab

This tab shows scatterplots of your real and simulated data sets. The upper panel shows the y vs. x scatterplot of the real data. Each data point is indicated in blue. The dashed red line is the least squares regression line calculated for the real data.

The lower panel shows the y vs. x scatterplot for one of the simulated data sets. Each data point is indicated in blue. The dashed red line is the average least squares regression line for the simulated data sets. Its slope will be close to zero.

Intercept Tab

This tab illustrates the intercept of the regression equation:

intercept

where a is the intercept of the regression equation. The intercept is the predicted value of the y variable when the x variable takes on a value of 0.0.

This tab gives the actual probability test in which the observed intercept is compared to the intercept in the simulated communities. In the left-hand column of this tab, you will see the value of the observed index.

The next three columns form the histogram window, which summarizes the distribution of the intercept for the randomized data sets. The first two columns give the low and high boundaries of 12 evenly spaced histogram bins. In the right-hand column, the number of simulations tells you how many of the simulated intercepts were in each bin. These integers sum up to the total number of iterations that were specified for the run.

The placement of the observed intercept shows you, graphically, where the observation fell in the histogram distribution. You can use these data to plot the histogram and the observed value if you want to illustrate your results with a graph.

The lower window gives summary statistics (mean and variance) for the intercept of the randomized data sets. It then tells you the tail probability that the observed intercept was greater than or less than expected by chance.

Slope Tab

The slope b of the regression equation is calculated as:

slope

where x and y are the paired coordinates for each data point, and the summation is over all of the observations in the data set. The slope of the regression equation indicates whether there is a positive or a negative relationship between the data. A slope of 0.0 indicates that the fitted regression line is flat. A slope of 0.0 represents the null hypothesis that is being tested.

The description of the output for this tab is the same as for the Intercept Tab.

r Tab

The product-moment correlation coefficient r is calculated as:

product-moment correlation coefficient

where x and y are the paired coordinates for each data point, and the summation is over all of the observations in the data set. The correlation coefficient ranges from a minimum of -1.0 to a maximum of 1.0. It measures how well the data match up with the predictions of the fitted line. If r = 1.0, the slope of the regression is positive, and all of the data points fall on the predicted line. If r = -1.0, the slope is negative, but all of the points still fall on the predicted line. The closer r is to 0.0, the more scatter there is of data points around the regression line. The null hypothesis in this case is that r = 0.0, in which case there is no significant correlation (positive or negative) between the x and the y variable.

The description of the output for this tab is the same as for the Intercept Tab and the Slope Tab.

ANOVA

Observed Means Tab

For each group in the original data, this tab shows the group label, the mean, and variance of the response variable.

Simulated Means Tab

For one of the simulated data sets, this tab shows the group label, the mean, and the variance of the response variable. Under the null hypothesis, these means will all be similar; any differences between them reflect random variation in the assignment of observations to groups.

Pseudo F-Ratio Tab

EcoSim calculates the standard F-ratio as the test statistic. We call it a "pseudo-" F-ratio to remind you that the significance test is not determined indirectly, by looking up the theoretical value in a statistical table, but directly, through simulation. The larger the F-ratio, the more different the means are among the groups; the smaller the F-ratio, the more similar the group means are.

This tab gives the actual probability test in which the observed F-ratio is compared to the F-ratio in the simulated communities. In the left-hand column of this tab, you will see the value of the observed variance.

The next three columns form the histogram window, which summarizes the distribution of the F-ratio for the randomized data sets. The first two columns give the low and high boundaries of 12 evenly spaced histogram bins. In the right-hand column, the number of simulations tells you how many of the simulated F-ratios were in each bin. These integers sum up to the total number of iterations that were specified for the run.

The placement of the observed F-ratio shows you, graphically, where the observation fell in the histogram distribution. You can use these data to plot the histogram and the observed value if you want to illustrate your results with a graph.

The lower window gives summary statistics (mean and variance) for the F-ratio of the randomized data sets. It then tells you the tail probability that the observed F-ratio was greater than or less than expected by chance.

Chi-square

Expected Matrix Tab

This tab illustrates the expected cell values that were used in the chi-square calculation.

Chi-square Tab

EcoSim calculates a chi-square deviation statistic to compare the fit of the data to the expected matrix. The chi-square deviation (CSD) is calculated as:

chi square deviation

where the data table has i = 1 to n rows and j = 1 to s columns. For cell ij in the table, Obsij is the observed value and Expij is the cell expectation (taken from the Expected Matrix).

Note that the chi-square deviation statistic is calculated both for the actual data and for each of the simulated data matrices. The larger the CSD, the more different the data are from the expected values. This tab allows you to compare the CSD for the observed data with the CSD for the simulated data. If the CSD is significantly large, the observed data fit the expected values significantly worse than do the simulated data. If the CSD is unusually small, the observed data are significantly closer to the expected values than expected by chance. If the observed CSD does not fall in the extreme tails, then the deviation of the observed data from the expected matrix is about what would be expected by chance.

This tab gives the actual probability test in which the observed CSD is compared to the CSD in the simulated communities. In the left-hand column of this tab, you will see the value of the observed CSD.

The next three columns form the histogram window, which summarizes the distribution of the CSD for the randomized data sets. The first two columns give the low and high boundaries of 12 evenly spaced histogram bins. In the right-hand column, the number of simulations tells you how many of the simulated CSDs were in each bin. These integers sum up to the total number of iterations that were specified for the run.

The placement of the observed CSD shows you, graphically, where the observation fell in the histogram distribution. You can use these data to plot the histogram and the observed value if you want to illustrate your results with a graph.

The lower window gives summary statistics (mean and variance) for the CSD of the randomized data sets. It then tells you the tail probability that the observed CSD was greater than or less than expected by chance.

Runs Test

Length of Longest Run Tab

This tab shows the length of the longest run, which is the largest number of consecutive sites un which there was a "1" in the data (occurrence), when the data are ordered from the smallest to the largest value of the continuous variable. The null hypothesis is that the length of the longest run is no different than expected by chance when the data are ordered by the continuous variable. If the length of the longest run is greater than expected, the occurrences are "clustered" together significantly. If the length of the longest run is less than expected, the occurrences are unusually "segregated" among the ordered sites.

This tab gives the actual probability test in which the observed run is compared to the runs in the simulated communities. In the left-hand column of this tab, you will see the value of the observed run.

The next three columns form the histogram window, which summarizes the distribution of the run for the randomized data sets. The first two columns give the low and high boundaries of 12 evenly spaced histogram bins. In the right-hand column, the number of simulations tells you how many of the simulated runs were in each bin. These integers sum up to the total number of iterations that were specified for the run.

The placement of the observed run shows you, graphically, where the observation fell in the histogram distribution. You can use these data to plot the histogram and the observed value if you want to illustrate your results with a graph.

The lower window gives summary statistics (mean and variance) for the runs of the randomized data sets. It then tells you the tail probability that the length of the observed run was greater than or less than expected by chance.

Mean Tab

This tab shows the mean of the continuous variable for the sites in which occurrences were recorded. It measures whether the occurrences tend to be clustered at relatively large or relatively small values of the continuous variable.

The description of this tab is the same as for the Length of the Longest Run Tab.

Variance Tab

This tab shows the variance of the continuous variable for the sites in which occurrences were recorded. It measures whether the occurrences tend to be relatively clustered or overly-dispersed through the data. It should give results that are similar to the runs test. Specifically, data sets with relatively long runs will tend to have small variances (most of the occurrences are aggregated around a few consecutive values of the continuous variable), whereas data sets with unusually short runs will tend to have large variances (occurrences are widely dispersed among the data).

The description of this tab is the same as for the Mean Tab and the Length of the Longest Run Tab.

6. Caveats

Regression

Although the EcoSim gives you three tabs of output— slope, intercept, r — these measures are not really independent of one another. The p value for the slope and for r are identical to one another, and the shape of the histogram distributions are the same, even though the numerical values are different. If you examine the formulas for the slope and the correlation coefficient, you will see they differ only by the presence of a y-squared term in the denominator. Both measures are quantifying the relationship between x and y, but in different units.

Perhaps more surprising is the fact that the test for the intercept is also not independent. If you examine the output carefully, you will see that the histogram for the intercept is a mirror image of the histogram for the slope (and for r), and that p value for the intercept test equals 1.0 - p for the slope or r test. This situation arises because all regression lines must pass through the mean x and mean y values of the data:

mean x mean y regression

Because EcoSim simply reshuffles the observed data, the mean y and mean x values do not change in the simulated data sets. Therefore, simulated data sets with a large slope will have a small intercept, and vice versa. This causes the results of the intercept test to be in the opposite tail of the distribution as the slope and r tests.

The intercept test is included for completeness, but most users will want to base their analyses on the test of slope or of r, since these tests address the strength of the relationship between x and y.

Perhaps the most subtle issue in using any statistical test— be it a null model or a a standard parametric analysis— is deciding whether the observations are truly independent of one another. For example, if each observation in your data set is a different species, one could argue that such observations are not truly independent of one another. We expect closely related species to have similar attributes, so that data on 10 closely related species in the same genus might not constitute 10 independent points. In some analyses, we actually are testing for this independence itself. For example, a comparison by ANOVA of the body sizes of fish species in different genera (see tutorial) can be used to decide whether the body sizes of species in the same genus can be treated as independent. The comparative method (Harvey and Pagel 1991) is a statistical framework for dealing with data that exhibit non-independence because of a shared evolutionary history. Many sophisticated statistical techniques, including phylogenetic regression (Garland et al. 1993), are now available for analyzing such data. However, such analyses may require a great deal of phylogenetic information, and in some cases, the results may not differ much from analyses that ignore phylogenetic effects (Ricklefs and Starck 1996).

ANOVA

The one-way ANOVA that EcoSim provides offers two advantages over conventional analysis of variance: first, it does not assume that the data are normally distributed with equal variances among groups, and second, it is less sensitive to unequal sample sizes among the groups (an unbalanced design). Some non-parametric tests, such as the Mann-Whitney U-test or the Wilcoxon two-sampel test, have similar properties. However, the results of all statistical tests may be sensitive to small sample sizes.

If the sample sizes in your data are small, a useful precaution is to use a jackknife analysis (Manly 1991). If your data set has n observations, delete the first observation, re-run the test with only n - 1 observations, and record the result. Now replace that data point, delete the second observation, and repeat. You will end up with a set of n analyses, each based on n - 1 observations, with a different data point deleted from each analysis (Efron and Gong 1983). If all these results are qualitatively the same, you can be confident in your interpretation. However, if the statistical significance of the results changes substantially in these different runs, you need to be more cautious, because deletion of a single observation could have changed your interpretation. In such cases, the best solution may be to try and collect more data!

Chi-square

The Chi-square test offers two important advantages over conventional analyses. First, the results are not sensitive to small expected values. If the expected value for a cell in the analysis is less than 1.0, deviations for that cell can cause big changes in the chi-square deviation statistic. This can cause problems for parametric analysis, but it does not distort the randomization test because the same expected values are used for both the observed and the simulated data. The second advantage of the randomization test is that the user does not have to worry about specifying the degrees of freedom associated with the test. In a conventional analysis, the number of degrees of freedom depends subtly on whether extrinsic or intrinsic hypotheses are used to generate the expected values (Sokal and Rohlf 1995).

Nevertheless, EcoSim's Chi-square test can get you into a lot of trouble if you are not very careful about setting the expected values and choosing the appropriate randomization algorithm. Here are some guidelines to help you.

First, as long as your data table has at least 2 rows and at least 2 columns, you should initially use the default values as a test for 2-way independence. The null hypothesis is that the row frequencies are independent of the column frequencies. For example, if the table consisted of an analysis of the frequencies of males and females in 3 species of grasshoppers, the default values will test the null hypothesis that the sex ratio is the same for all three species of grasshoppers. Equivalently, we are testing whether the proportion of individuals in each of the 3 species of grasshopper is the same for males as it is for females.

The next step is to test other hypotheses by altering both the expected values in the matrix and the simulation algorithm used. Here are the basic rules for altering the expected values in the matrix:

So, what other values should you use for expected matrix?

A natural null hypothesis is that of a uniform distribution of observations among the categories. If you choose to randomize rows, then you should set all of the expected values within a given row to be equal, remembering that they must sum up to the observed row totals. Now you are testing the null hypothesis that the categories in the columns are being sampled equiprobably, even though each row might have a different total number of observations.

Conversely, you can randomize the column totals and set the expected values the same in each column, again ensuring the the sum of each column in the expected matrix matches the sum of the corresponding column in the original data matrix. Now you are testing the hypothesis that the null hypothesis that the categories in each row are being sampled equiprobably for each of the different columns.

These two tests represent intrinsic hypotheses of uniform distributions of values. As an analogy with a standard ANOVA, the default analysis tests the interaction between the two variables, and the uniform column and row sampling test the "main effects" of row and column differences.

So, your first exploration of the data should begin with the default values for the independence test, followed by tests for main effects of differences in row and column frequencies. These latter two tests will require you to alter the expected values and the simulation algorithm for the appropriate null hypothesis.

Incidentally, the 2-way test is appropriate only when the dimensions of your data matrix are at least 2 x 2. If you provide EcoSim with a single row or column vector as the input matrix, the default expected matrix is going to match the observed values exactly, and the test won't make sense! If you are using a vector of data, there is no 2-way independence test that can be performed because you only have one factor. In this case, you will have to immediately alter the expected values and set the simulation algorithm to randomize rows (for a row vector) or randomize columns (for a column vector).

If you have a theoretically empirically derived cell expectations, these can also be plugged into the expected matrix. Just remember to ensure that the row sums or column sums match those of the original data (EcoSim will not check this for you). Then use the appropriate randomization of rows or columns to match.

As an example of this sort of analysis, you might generate expected frequencies of different genotypes in a population from the Hardy-Weinberg equilibrium, or some other evolutionary model. You could first test your data against the null hypothesis of equal frequencies of observations and then test against the null hypothesis of frequencies derived from the genetic model. If the genetic model is providing a good fit to the data, you should find that the first null hypothesis is rejected, but not the second (see Hilborn and Mangel 1997 for methods of choosing among alternative models).

As a second example, you could derive the expected frequencies of species in different abundance classes using a broken-stick, log-normal, or some other species-abundance model (Magurran 1988), and then use these expectations to generate hypothesis tests for the observed frequencies in a community that you have sampled (Wilson 1993).

Runs Test

The runs test explores whether occurrences are non-random with respect to a continuous predictor variable. The simple null hypothesis (equiprobable sampling) is that the length of the longest run is no greater than expected by chance, and that the mean and variance of the continuous variable for the occupied sites do not differ than expected by chance.

Proportional sampling provides a sophisticated twist on this null hypothesis, and treats each site as a "target" with the probability of occurrence being proportional to the continuous variable. Many data sets that reject the simple null hypothesis of equiprobable placement may show a random fit with this proportional model. Both models are ideal for testing the distribution of individual species and determining whether there are non-random incidence functions or critical minimum areas for site occupancy (Simberloff and Gotelli 1984). Note that data transformations of the continuous variable will not affect the runs test unless the ordering of the sites is altered. However, the results of the proportional sampling may be quite sensitive to data transformations, because these will alter the relative probabilities with which each site is sampled. Thus, species occurrences might be non-random with respect to island area, but random with respect to the logarithm of island area.

Finally, note that the EcoSim runs tests can accept either binary presence-absence data or integer counts for the response variable. If you use integer counts for the response variable, you are assuming that each element is randomly and independently placed (sampling with replacement), whereas the more conservative presence-absence test uses only the distinct sample occurrences as independent elements (sampling without replacement).

7. Tutorial

For all of the tutorial analyses, select the appropriate module under Standard Tests. Next, go to the General Tab and set the Random Number Seed to 10. Doing so will ensure that your results exactly match those in the tutorial. For general data analysis, use the default value of 0, so that EcoSim will retrieve a different random number seed every time it runs.

Regression

For the regression analysis, first open the data file Midwest fishes.txt, which can be found in the Tutorial Data Sets folder within EcoSim. As explained in the Macroecology tutorial, this file gives you data on 41 species of fishes from the Cimarron River Oklahoma (Gotelli and Taylor 1999a, 1999b). Each row is a different species of fish and each column is a different macroecological variable measured for these species. For our analysis, select as the x variable "SIZE" and as the y variable "EXT". These variables are the average body size of the species, and the average annual probability of local extinction, measured for 10 years at a set of 10 sites in the Cimarron River, Oklahoma.

Now run the simulation, which shouldn't take more than a second or two.

The Input Matrix Tab shows you the original data, and the Simulation Tab shows you one of the randomized data sets. Notice that the randomization reshuffles only the y values, which is sufficient to scramble the pattern with respect to the x values.

Now examine the Charts Tab. You can see that the actual data exhibit a positive slope, suggesting that large-bodied species are more extinction-prone. However, the observed data set does not appear strikingly different from the simulated data set, and we will need to examine the probability distribution to make a decision on this pattern.

The Intercept Tab, Slope Tab, and r Tab demonstrate that the positive relationship between body size and extinction probability is indeed greater than expected by chance. The observed intercept is 0.41718, compared to an average intercept of 0.50428. The observed intercept is significantly smaller than expected by chance (p = 0.027). Similarly, the observed slope (0.0021) is significantly larger than the average slope in the randomized data sets (0.00001; p = 0.027), as is the correlation coefficient r (observed r = 0.30387, average r in simulated data sets = 0.01019, p = 0.027). We conclude that the relationship between body size and extinction probability is non-random. However, it is difficult to tease apart cause and effect in such a simple analysis, because body size is correlated with other factors such as population size, which also influence the probability of extinction (Gotelli and Taylor 1999b). It is interesting to compare the results of this randomization test with the standard parametric test for the significance of the slope:

Significance of Slope
Parametric Test 0.040
Randomization Test 0.027

In this case, the two analyses both generate significant results, although the parametric test is less extreme in its p value.

ANOVA

To demonstrate the ANOVA test, load the file Fish body sizes.txt which can be found in the Tutorial Data Sets folder within EcoSim. This data files gives body sizes for 9 fish species in 3 genera (Lepomis, Pimephales, and Carpiodes) that were sampled in the Cimarron River, Oklahoma (Gotelli and Taylor 1999a, 1999b). Body sizes measurements (maximum standard length in mm) were taken from Lee et al. (1980). These data do not include all of the species in each genus or all of the species that were found in the Cimarron, but they are sufficient to illustrate the use of the ANOVA test.

The very first column in the data set gives the name of each species. The second column gives the genus, and the third column gives the body size.

We wish to test the null hypothesis that average body size does not differ for significantly for species in the different genera. Note that we are here testing for variation among species but within genera. We do not have data on variation among individuals within a species.

Select the ANOVA analysis. The Preferences Tab asks you to specify the labels and the y variable. In this case, you can retain EcoSim's default choices because the labels are given in the column with the genus names, and the y variable is body size.

Run the analysis and examine the output tabs.

As usual, the Input Matrix Tab shows you the original data, and the Simulation Tab shows you one of the randomized data sets. In the randomized data set, notice that it is the labels themselves that are reshuffled and re-assigned to each of the species. Thus, the null hypothesis is that the arrangement of the data is no different than if the species were randomly assigned to the different genera.

The Observed Means Tab gives you the average body size, variance in body size, and number of observations for each of the genera in your data set. These data are useful for graphing your output. You can see that average body size varies from a minimum of 95.5 mm in Pimephales to a maximum of 520.0 in Carpiodes. However, the sample sizes are small and unequal, and the variances are large, so it is not clear if these differences are statistically signficant.

The Simulated Means Tab displays the same data for one of the randomized data sets. On average, the means in each group of the simulated data should be approximately the same. However, with such small sample sizes, there is considerable variability among groups even in this random data set (151.0 to 374.5 mm).

In spite of such variation, the analysis does indicate that the observed means for each genus are indeed different from one another. The Pseudo F-Ratio Tab displays the observed F-ratio (6.488) and the distribution of 1000 F-ratios for the simulated data sets. These have a mean of only 1.540, and the observed F-ratio is significantly larger than the simulated (p = 0.028).

The interpretation is that the differences in means that were observed for the 3 genera are unusually large, generating a large among-group variance. Once again, it is interesting to compare to conventional statistical analyses. This time we are comparing to a standard one-way ANOVA and a non-parametric Kruskal-Wallis test:

Significance of Test
One-way ANOVA 0.032
Kruskal-Wallis Test 0.089
Randomization Test 0.028

Although the parametric ANOVA and the randomization test give similar results, the nonparametric Kruskal-Wallis test is marginally non-significant.

Chi-square

To demonstrate the chi-square test, open the data file Alaska seabirds.txt in the Tutorial Data Sets folder within EcoSim. These data are from Table 1 of Whittam and Siegel-Causey (1981). Each row is a different species of Alaska seabird (abbreviations are spelled out in Table 1 of Whittam and Siegel-Causey (1981)), and each column is a species richness class, ranging from 1 to 13. The final category (13S) includes all colonies with 13 to 20 species present. Each entry in the table is the number of occurrences of a particular species in a colony of a particular size. For example, there were 3 records in which the Fork-tailed Storm Petrel (FSP; Oceanodroma furcata) occurred in colonies with only 1 species (i.e., by itself), and 13 records in which the Pelagic Cormorant (PC; Phalacrocorax pelagicus) was found in colonies of exactly 9 species.

Our first test for these data will be a test of 2-way independence. Do species differ in the frequency with which they occur in small versus large colonies? Equivalently, the 2-way independence test asks whether large and small colonies differ in the frequencies with which different species are represented.

Calling up the Chi-square test immediately displays the expected matrix. Take a minute to study this matrix carefully. From the formula given earlier, you will see that these expected values are derived from the row and column totals of the original matrix.

For example, in the expected matrix, the cell with the greatest expected value (75.25) is the Glaucous-winged Gull (GWG; Larus glaucescens) in colonies of 4 species (4S). This cell has the largest expected value because it has both the largest row total (531 for GWG) and the largest column total (440 for 4S). Whereas the expected value was 75.25, the observed value in the data matrix in this cell was 79.

Conversely, the cell in the expected matrix with the smallest value (0.07) is the Red-legged Kittiwake (Rissa brevirostris) in colonies of 12 species (12S). This cell has the smallest expected value (0.07) because it has both the smallest row total (6 for RLK) and the smallest column total (36 for 12S). The expected value of 0.07 compares with an actual frequency of 0 in the original data matrix. Inotherwords, the Red-legged Kittiwake was never recorded in colonies with exactly 12 species.

Although the observed and expected values match reasonably well for these two cells, the overall fit is highly non-random. Run the analysis with the defaults to see this result. The Input Matrix Tab shows the original data, the Expected Matrix Tab shows the expected matrix (derived in this analysis from the marginal totals), and the Simulated Matrix Tab shows one of the random matrices. If you created a large number of such random matrices, the averages for each cell would be very close to the values in the expected matrix. The Chi-square Tab shows you the deviation statistic for the observed matrix (371.17) as well as the distribution of values for the 1000 simulated matrices (average = 206.75). The probability test is highly signifcant, (p < 0.001) indicating that species differ in their frequency of occurrence in species-rich and species-poor colonies.

Now, let's move beyond EcoSim's default values and test two other hypotheses about the structure in this data. Both hypotheses concern the distribution of marginal values in the matrix. First we will examine a model in which the row totals of the matrix are held constant, but the column totals are allowed to vary randomly and equiprobably (see the additional discussion in the Co-occurrence Help on these type of algorithms). Here we are testing whether each species is represented equally in colonies of different size.

Run the analysis a second time for this data matrix, but first change the simulation constraint to Randomize Rows. Next, go to the File menu and load from disk the file named Seabird expected rows.txt. You will see that the expected values have now changed. Notice that the expected values are now different in every row, but the same across all of the columns of any given row. The null hypothesis here is that the species are distributed uniformly across colonies of different size. The expected values within each row are the same, and they add up to the observed row total in the original matrix.

Running this test gives a highly significant Chi-square value (477.51; p < 0.001), so we reject the null hypothesis that species occur with equal frequency in colonies of all sizes. This result is expected because there is so much heterogeneity in the column totals of the matrix, which range from 36 (12S) to 440 (4S).

Now let's reverse the process and test for heterogeneity in the row totals. To do this, run the Chi-square analysis again for the Alaskan seabirds.txt data file. This time, however, you will randomize columns, and load the data file Seabird expected columns.txt for the expected values. You will see that these expected values are again different. For this analysis, the values differ among columns, but are identical for every row within a column. In this case, we are testing the null hypothesis that the species occurrence frequencies are identical and are distributed randomly among the species richness classes.

As before, the chi-square deviation statistic (750.77) is significantly larger than the average value for the 1000 simulated matrices (202.30; p < 0.001), and we reject the null hypothesis. Again, this result is expected because the row totals vary considerably, from a minimum of 6 (RLK) to a maximum of 531 (GWG).

So, what have we learned? First, both the row and column totals were highly heterogenous, as indicated by the two analyses in which we randomized within rows and randomized within columns. Species differ considerably in their frequency of occurrence (row totals) and there are different numbers of records of species-poor and species-rich colonies (column totals). However, these two factors do not entirely account for the data. We also rejected the two-way test for independence (the default analysis). Apparently, different species have different incidence functions and occur with differing frequencies in species-poor and species-rich seabird colonies. Whittam and Siegel-Causey (1981) describe some simple equations you can use to pinpoint which cells in the matrix are exhibiting positive and negative deviations.

It is interesting to think about how these incidence function analyses differ from those that we carried out in the Runs Test tutorial. First, there is a big difference in the nature of the data sets. In the Runs Test, we had a presence-absence matrix (only 1s and 0s) for a set of species in an archipelago of islands. For the Chi-square analysis, we had multiple records of species occurrences in colonies that were classified according to their size (number of species present). In the Runs Test, we examined the distribution of each individual species and asked if occurrences differed when sites were treated as equal (equiprobable null model) or as targets of varying size (proportional null model). In the Chi-square analysis, we asked whether the set of species, as a group, occurred with different frequencies in communities of different sizes (the 2-way independence test). Both tests tell us slightly different things about the distribution of species, and both are based on different kinds of data sets.

Although the statistical machinery for testing for incidence functions has been in place for some time, there have been relatively few tests for organisms other than birds on islands. If you have high-quality data sets for other taxa, this would be a worthwhile (and probably publishable) analysis to conduct.

Runs Test

To demonstrate the runs test, open the data file Finch occurrences.txt in the Tutorial Data Sets folder within EcoSim. These data are the same as in West Indies finches.txt and West Indies islands.txt, but organized differently. In this data set, each row is an island, each column is a species of finch, and the entries are the presence (1) or absence (0) of a finch species on a particular island. The first column of data is labeled Area, and gives the island areas (in square miles) for each island in the data set.

We will use the runs test to examine the occurrence sequence of particular species across the set of islands. We want to know whether the distribution of presences and absences is non-random with respect to island area.

Once you have the data file loaded, select the Runs Test from the Analyze Menu (under Standard Tests), and set the random number seed to 10 (so the results match those in this tutorial) on the Preferences Tab. Next, choose Area as the x variable and choose the species Tiara_olivacea for analysis. Retain the default option of Equiprobable for the Simulation Constraints, and then run the model.

The Input Matrix Tab shows the original data, and illustrates that Tiara olivacea occurs only on the 4 largest islands in the West Indies (Cuba, Hispaniola, Jamaica, and Puerto Rico) and on Grand Cayman. The Simulation Tab shows one of the simulated data sets, sorted from smallest to largest island area. In this simulation, the 5 occurrences of this species have been randomly reshuffled among the islands, and now are found on St. Martin, St. Kitts, Barbados, Dominica, and Cuba.

The next three tabs illustrate the output and simulation results. The Length of Longest Run Tab shows that the observed longest run was 4 (4 occurrences were on islands of consecutively increasing size). In most of the simulated data sets, runs of only 1 or 2 were most typical. The average run length for the simulated data sets was 1.90991, and runs of length 4 or greater (for this data set, the maximum run length is 5 and the minimu is 1) showed up in only 25 out of the 1000 simulation trials (p = 0.025). We conclude that the observed run is substantially longer than expected by chance if species occurrences were distributed randomly and equiprobably among islands.

The Mean Tab shows the average island area of the occupied sites (16409.6 mi2), and compares it to the average area for the simulated data sets (4457.1 mi2). The observed mean is significantly larger (p = 0.002), and we conclude that this species occurs on larger islands than expected by chance if species occurrences were distributed randomly and equiprobably among islands.

The Variance Tab shows the same pattern for the variance in the island area of occupied sites. Although the observed number of runs was significantly large, the variance is not significantly small, because the occurrence on tiny Grand Cayman is an "outlier" that increases the variance and leads to a non-significant pattern (p = 0.181).

Repeat the analysis for Tiara olivacea, but use Proportional for the Simulation Constraint. Now you have altered the model so that the islands are no longer equiprobable. Instead, each island behaves as a "target" and the probability of occurrence is proportional to the area of the island.

This is an important change in the null model, and you will see that your results change accordingly. For example, in the simulated data set that is illustrated, all 5 occurrences are on the 5 largest islands (Cuba, Hispaniola, Jamaica, Puerto Rico, and Guadeloupe). As you might predict, the observed run of 4 is now only slightly greater than the mean (3.842), and is no longer significant (p = 0.758). As before, the observed variance in the area of occupied islands is also not significantly small (p = 0.687).

Now that you understand how the analysis works, try analyzing two other species, Loxigilla noxis and Loxigilla violacea. Loxigilla noxis has an interesting distribution because it is missing from the larger islands, whereas Loxigillis violacea occurs only on Hispaniola and Jamaica.

The following table summarizes the results of all of these statistical tests:

Equiprobable Proportional
Run Mean Variance Run Mean Variance
Tiara bicolor + ++ ns ns ns ns
Loxigilla noxis ns -- -- ns --- ---
Loxigilla violacea ns ns ns ns ns ns

In the table, a "+" indicates that the observed was greater than the simulated and a "-" indicates the observed index was less than expected. One symbol = p < 0.05; two symbols = p < 0.01; 3 symbols = p < 0.001; "ns" = not significant (p > 0.05).

These results can be interpreted in the context of Diamond's (1975) incidence functions. Tiara bicolor corresponds to a "high-S" species that occurs most frequently on large islands. However, under the proportional model, such a distribution could arise by random colonization of islands with different site probabilities. In contrast, Loxigilla noxis is unusually absent from large islands using either null model, and matches Diamond's (1975) description of a "supertramp" species that only occurs in species-poor (= small island) communities. Finally, we did not reject the null hypothesis for Loxigilla violacea in of the tests, and its distribution could be described as random. Note, however, that Loxigilla violacea only has two island occurrences. Species that are extremely sparse or extremely common will usually not cause the null hypothesis to be rejected unless the sample size (i.e, the number of islands) is very large because there are relatively few re-arrangements of the data that can be produced with the null model.

Finally, it is interesting to think about these results in the context of the co-occurrence analysis, which also used the West Indies finch data. In the co-occurrence analysis (see the Co-occurrence Tutorial), we found that the distribution of species was highly non-random, and less than expected by chance. Such patterns could arise if species respond differently to island area or other site factors. Thus, Tiara bicolor occurs disproportionately on large islands and Loxigilla noxis occurs disproportionately on small islands. In a co-occurrence analysis, such species pairs will be found together less often than expected by chance. But which came first, the chicken or the egg? Do species' individual responses to site conditions lead to patterns of negative co-occurrence? Or do negative species interactions cause species to distribute themselves differently with respect to site characteristics? The simulations can't answer these questions, but they can at least pinpoint where the non-random patterns are.

Analysis of angular data

To see an analysis of angular data, open the tutorial data file Darlingtonia wasp visits.txt. This file contains unpublished data of A.M. Ellison & N.J. Gotelli, collected on wasp visitation patterns to pitchers of the cobra lilly Darlingtonia californica. Each row in the data set is a different plant in a ~0.25 m2 plot, censused in the Siskiyou Mountains of southern Oregon. For each plant, we recorded the height of the plant in cm, the compass orientation of the pitcher opening (0 degrees = north), and the number of wasp visits recorded at the plant. Only 7 of the 51 plants received wasp visits. Most visited plants had only 1 or two visits, but Plant #35 received 3 visits. We will use these data to test whether wasp visits to plants are non-random with respect to pitcher orientation and height.

For the x-variable, select direction, and for the y-variable, use visited. The simulation constraints shoudl be set to equiprobable, the sampling algorithm is without replacement, and the data type is angular.

In the output you can see that the length of the longest run was only 1, which was similar to the simulated values. The average angle of orientation for the visited plants was 67.93 degrees, compared to an average simulated angle of 73.20 degrees (tail p = 0.511). The variance or dispersion of the angles of the visited plants also appears random (p = 0.156).

Now rerun the analysis, but this time change the sampling option to sampling with replacement. This analysis treats each wasp visit as an independent visit. Again, none of the simulation histograms appear non-random. However, notice that the observed length of the longest run is now 3, not 1. This change resulted because one of the plants had 3 visits, which are scored as run of 3 in this analysis.

Do these results mean that wasp visits to Darlingtonia are random? Not entirely. Try running the analysis using plant height as the continuous predictor variable. Be sure to change the data type from "angular" back to "linear" before running this model! You will see that the visited plants are significantly larger than the unvisited plants, whether the model used is sampling with or without replacement. Thus, wasps appear to visit taller plants more frequently than shorter plants, but visitation is random with respect to pitcher orientation.

8. Literature Cited

Cody, M.L. 1974. Competition and the structure of bird communities. Princeton University Press, Princeton.

Coleman, B.D., M.A. Mares, M.R. Willig and Y.-H. Hsieh. 1982. Randomness, area, and species richness. Ecology 63: 1121-1133.

Diamond, J.M. 1975. Assembly of species communities. In: Ecology and evolution of communities (ed. M.L. Cody and J.M. Diamond). pp. 342-444. Harvard University Press, Cambridge.

Edgington, E.S. 1987. Randomization tests. Marcel Dekker, New York.

Efron, B. and G. Gong. 1983. A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician 37: 36-48.

Fienberg, S.E. 1980. The analysis of cross-classified categorical data. MIT Press, Cambridge.

Fisher, N.I. 1993. Statistical analysis of circular data. Cambridge University Press, Cambridge.

Garland, T., Jr., A.W. Dickerman, C.M. Janis, and J.A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42: 265–292.

Gotelli, N.J. and C.M. Taylor. 1999a. Testing metapopulation models with stream-fish assemblages. Evolutionary Ecology Research 1: 835-845.

Gotelli, N.J. and C.M. Taylor. 1999b. Testing macroecology models with stream-fish assemblages. Evolutionary Ecology Research 1: 847-858.

Harvey, P.H. and M.D. Pagel. 1991. The comparative method in evolutionary biology. In: Oxford Series in Ecology and Evolution (ed. R.M. May and P.H. Harvey). Oxford University Press, Oxford.

Hilborn, R. and M. Mangel. 1997. The ecological detective: confronting models with data. Princeton University Press, Princeton.

James, F.C. and C.E. McCulloch. 1985. Data analysis and the design of experiments in ornithology. In: Current Ornithology, vol. 2 (ed. R.F. Johnston). pp. 1-63. Plenum Publishing Corporation.

Lee, D.S., C.R. Gilbert, C.H. Hocutt, R.E. Jenkins, D.E. McAllister, and J.R. Stauffer, Jr. eds. 1980. Atlas of North American Freshwater Fishes. North Carolina State Museum of Natural History, Raleigh, NC.

Magurran, A.E. 1988. Ecological diversity and its measurement. Princeton University Press, Princeton.

Manly, B.F.J. 1991. Randomization and Monte Carlo methods in biology. Chapman and Hall, London.

McGuinness, K.A. 1988. Explaining patterns in abundances of organisms on boulders: the failure of 'natural experiments'. Marine Ecology Progress Series 48: 199-204.

Resetarits, W.J., Jr. and J. Bernardo. 1998. Experimental ecology: issues and perspectives. Oxford University Press, New York.

Ricklefs, R.E. and J.M. Starck. 1996. Applications of phylogenetically independent contrasts: A mixed progress report. Oikos, 77: 167–172.

Schoener, T.W. and A. Schoener. 1983. Distribution of vertebrates on some very small islands. I. Occurrence sequences of individual species. Journal of Animal Ecology 52: 209-235.

Simberloff, D. and N. Gotelli. 1984. Effects of insularisation on plant species richness in the prairie-forest ecotone. Biological Conservation 29: 27-46.

Sokal, R.R. and F.J. Rohlf. 1995. Biometry. Third edition. W.H. Freeman and Company, New York.

Whittam, T.S. and D. Siegel-Causey. 1981. Species incidence functions and Alaskan seabird colonies. Journal of Biogeography 8: 421-425.

Wilson, J.B. 1993. Would we recognise a Broken-Stick community if we found one? Oikos 67: 181-183.


All Pages Copyright © 2003
by Kesey-Bear and Acquired Intelligence, Inc.
All rights reserved.