Basic Statistical Concepts for Nurses

Introduction
As the context of health care is changing due to the pharmaceutical services and technological advances, nurses and other health care professionals need to be prepared to respond in knowledgeable and practical ways. Health information is very often explained in statistical terms for making it concise and understandable. Statistics plays a vitally important role in the research. Statistics help to answer important research questions and it is the answers to such questions that further our understanding of the field and provide for academic study. It is required the researcher to have an understanding of what tools are suitable for a particular research study.  It is essential for healthcare professionals to have a basic understanding of basic concepts of statistics as it enables them to read and evaluate reports and other literature and to take independent research investigations by selecting the most appropriate statistical test for their problems. The purpose of analyzing data in a study is to describe the data in meaningful terms.
Descriptive approach and inferential approach
Depending on the kinds of variables identified (nominal, ordinal, interval, and ratio) and the design of particular study, a number of statistical techniques is available to analyze data. There are two approaches to the statistical analysis of data the descriptive approach and inferential approach. Descriptive statistics convert data into picture of the information that is readily understandable. The inferential approach helps to decide whether the outcome of the study is a result of factors planned within design of the study or determined by chance. The two approaches are often used sequentially in that first, data are described with descriptive statistics, and then additional statistical manipulations are done to make inferences about the likelihood that the outcome was due to chance through inferential statistics. When descriptive approach is used, terms like mean, median, mode, variation, and standard deviation are used to communicate the analysis information of data. When inferential approach is used, probability values (P) are used to communicate the significance or lack of significance of the results (Streiner & Norman, 1996).

Measurement
Measurement defined as “assignment of numeral according to rules” (Tyler 1963:7). Regardless of the variables under study, in order to make sense out of data collected, each variable must be measured in such a way that its magnitude or quantity must be clearly identified. The specific strategy for a particular study depends upon the particular research problem, the sample under study, the availability of instruments, and the general feasibility of the project (Brockopp & Hastings-Tolsma, 2003). A variety of measurement methods are available for use in nursing research.  Four measurement scales are used: nominal, ordinal, interval and ratio.

The nominal level of measurement
The nominal level of measurement is the most primitive or lowest level of classifying information.  Nominal variables include categories of people, events, and other phenomena are named, are exhaustive in nature, and are mutually exclusive. These categories are discrete and noncontinous. In case of nominal measurement admissible statistical operation are counting of frequency, percentage, proportion, mode, and coefficient of contingency.

The ordinal level of measurement
The ordinal level of measurement is second in terms of its refinement as a means of classifying information. Ordinal implies that the values of variables can be rank-ordered from highest to lowest.

Interval Level of Measurement
Interval level of measurement is quantitative in nature.  The individual units are equidistant from one point to the other. The interval data does not have an absolute zero. For example, temperature is measured in Celsius or Fahrenheit.   Interval level of measurement refers to the third level of measurement in relation to complexity of statistical techniques that can be used to analyze data. Variables with in this level of measurement are assessed incrementally, and the increments are equal.

Ratio Level of Measurement
Ratio level of measurement is characterized by variables that are assessed incrementally with equal distances between the increments and a scale that has an absolute zero. Ratio variables exhibit the characteristics of ordinal and interval measurement and can also be compared by describing it as two or three times another number or as one-third, one-quarter, and so on. Variable like time, length and weight are ratio scales and also be measured using nominal or ordinal scale. The mathematical properties of interval and ratio scales are very similar, so the statistical procedures are common for both the scales.

Errors of measurement
When a variable is measured there is the potential for errors to occur. Some of the sources of errors in measurement are, instrument clarity, variations in administrations, situational variations, response set bias, transitory personal factors, response sampling, and instrument format.

Population, Sample, Variable
Population is defined as the entire collection of a set of objects, people, or events, in a particular context. The population is the entire group of persons or objects that is of interest to the investigator. In statistics population means, any collection of individual items or units that is the subject of investigation. Population refers to the collection of all items upon which statements will be based. This might include all patients with schizophrenia in a particular hospital, or all depressed individuals in a certain community.
Characteristics of a population that differ form individual to individual are called variables. A variable is a concept (construct) that has been so specifically defined that precise observations and therefore measurement can be accomplished. Length, age, weight, temperature, pulse rate are a few examples of variables.
The sample is a subset of the population selected by investigator to participate in a research study. A sample refers to a subset of observations selected from the population. It might be unusual for an investigator to describe only patients with schizophrenia in a particular hospital and it is unlikely that an investigator will measure every depressed person in a community. As it is rarely practicable to obtain measures of a particular variable from all the units in population, the investigator has to collect information from a smaller group or sub-set that represents the group as a whole. This sub-set is called a sample.  Each unit in the sample provides a record, such as measurement, which is called an observation. The sample represents the population of those critical characteristics the investigator plan to study.

Dependent and independent variables
An independent variable is presumed cause of the dependent variable-the presumed effect. The independent variable is one which explains or accounts for variations in the dependent variable. An independent variable is one whose change results in change in other variable. In experiments, the independent variable is the variable manipulated by the experimenter. A dependent variable is one which changes in relationship to changes in another variable. A variable which is dependent in one study may be independent in another. Intervening variable is one that comes between the independent and dependent variable.

Hypothesis
Hypothesis is statement or declaration of the expected outcome of a research study. It is based on logical rationale and has empirical possibilities for testing. Hypothesis is formulated in experimental research. In some non-experimental correlational studies, hypothesis may also be developed. Normally, there are four elements in a hypothesis:
  • (1) dependent and independent variables,
  • (2) some type of relationship between independent and dependent variable,
  • (3) the direction of the change, and
  • (4) it mentions about the subjects, i.e. population being studied.
It is defined as “A tentative assumption made in order to draw out and test its logical or empirical consequences” (Webster 1968).
Standards in formulating a hypothesis (Ahuja, R. 2001):
  • It should be empirically testable, whether it is right or wrong.
  • It should be specific and precise.
  • The statements in the hypothesis should not be contradictory.
  • It should specify variables between which the relationship to be established
  • It should describe one issue only.
Characteristics of a Hypothesis
  • Characteristics of a Hypothesis (Treece & Treece, 1989)
  • It is testable
  • It is logical
  • It is directly related to the research problem
  • It is factually or theoretically based
  • It states a relationship between variables
  • It is stated in such a form that it can be accepted or rejected
Directional hypothesis predicts an outcome in a particular direction, and nondirectional hypothesis simply states that there will be difference between the groups. There can be two hypotheses, research hypothesis and null hypothesis. The null hypothesis is formed for the statistical purpose of negating it. If the research hypothesis states there is positive correlation between smoking and cancer, the null hypothesis states there is no relation between smoking and cancer. It is easy to negate a statement than establishing it. 
The null hypothesis is statistical statement that there is no difference between the groups under study. A statistical test is used to determine the probability that the null hypothesis is not true and rejected, i.e. inferential statistics are used in an effort to reject the null, thereby showing that  a deference does exists. The null hypothesis is a technical necessity when using inferential statistics, based on statistical significance which is used as criterion.

Types of errors
When the null hypothesis is rejected, the observed differences between groups are deemed improbable by chance alone. For example, if drug A is compared to a placebo for its effects on depression and the null hypothesis is rejected, the investigator concludes that the observed differences most likely are not explainable simply by sampling error. The key word in these statements is probable. When offering this conclusion, the investigator has the odds on his or her side. However, what are the chances of the statement being incorrect?
In statistical inference there is no way to say with certainty that rejection or retention of the null hypothesis was correct. There are two types of potential errors. A type I error occurs when the null hypothesis is rejected when indeed it should have been retained; a type II error occurs if the null hypothesis is retained when indeed it should have been rejected.

Type I Error
Type I errors occur when the null hypothesis is rejected but should have been retained, such as when a researcher decides that two means are different. He or she might conclude that the treatment works or those groups are not sampled from the same population whereas in reality the observed differences are attributable only to sampling error. In a conservative scientific setting, type I errors should be made rarely. There is a great disadvantage to advocating treatments that really do not work.
The probability of a type I error is denoted with the Greek letter alpha (a). Because of the desire to avoid type I errors, statistical models have been created so that the investigator has control over the probability of a type I error. At the .05 significance or alpha level, a type I error is expected to occur in 5 percent of all cases. At the .01 level, it may occur in 1 percent of all cases. Thus, at the .05 a level, one type I error is expected to be made in each of 20 independent tests. At the .01 a level, one type I error is expected to be made in each 100 independent tests.

Type II Error
The motivation to avoid a type I error might increase the probability of making a second type of error. In this case the null hypothesis is retained when it actually was wrong. For example, an investigator may reach the conclusion that a treatment does not work when actually it is efficacious. The probability of a type II error is symbolized by the Greek capital letter beta (B). Here the decision is not to reject the null hypothesis when in actuality the null hypothesis was false. This is a type II error with the probability of beta (B).

Statistical Power
There are several maneuvers that will increase control over the probability of different types of errors and correct decisions. One type of correct decision is the probability of rejecting the null hypothesis and being correct in that decision. Power is defined as the probability of rejecting the null hypothesis when it should have been rejected. Ultimately, the statistical evaluation will be more meaningful if it has high power.
It is particularly important to have high statistical power when the null hypothesis is retained. Retaining the null hypothesis with high power gives the investigator more confidence in stating that differences between groups were non-significant. One factor that affects the power is the sample size. As the sample size increases, power increases. The larger the sample, greater the probability that a correct decision will be made in rejecting or retaining the null hypothesis.
Another factor that influences power is the significance level. As significance increases, the power increases. For instance, if the .05 level is selected rather than the .01 level, there will be a greater chance of rejecting the null hypothesis. However, there will also be a higher probability of a type I error. By reducing the chances of a type I error, the chances of correctly identifying the real difference (power) are also reduced. Thus, the safest manipulation to affect power without affecting the probability of a type I error is to increase the sample size.
The third factor affecting power is effect size. The larger the true differences between two groups, the greater the power. Experiments attempting to detect a very strong effect, such as the impact of a very potent treatment, might have substantial power even with small sample sizes. The detection of subtle effects may require very large samples in order to achieve reasonable statistical power. It is worth noting that not all statistical tests have equal power. The probability of correctly rejecting the null hypothesis is higher with some statistical methods than with others. For example, nonparametric statistics are typically less powerful than parametric statistics, for example.

Sampling
The process of selecting a fraction of the sampling unit (i.e. a collection with specified dimensions) of the target population for inclusion in the study is called sampling. Sampling can be probability sampling or non-probability sampling.

Probability Sampling or Random sampling
Probability sampling, also called random sampling, is a selection process that ensures each participant the same probability of being selected. Probability sampling is the process of selecting samples based on probability theory. Probability theory states that possibility that events occur by chance. Random sampling is the best method for ensuring that a sample is representative of the larger population. Random sampling can be simple random sampling, stratified random sampling, and cluster sampling.

Nonprobability sampling
Nonprobability sampling is the selection process in which the probability that any one individual or subject selected is not equal to the probability that another individual or subject may be chosen. The probability of inclusion and the degree to which the sample represents the population are unknown. The major problem with nonprobability sampling is that sampling bias can occur. Nonprobability sampling can be convenience sampling, purposive sampling or quota sampling.

Sampling Error (Standard Error)
Sampling error refers to the discrepancies that inevitably occur when a small group (sample) is selected to represent the characteristics of a larger group (population).  It is defined as the deference between a parameter and an estimate of that parameter which is derived from a sample (Lindquist, 1968:8). The means and standard deviations calculated from the data collected on a given sample would not be the same as those calculations derived from data collected from the entire population. It is the discrepancy between the characteristics of the sample and the population that constitutes sampling error.

Descriptive statistics
Descriptive statistics are techniques which help the investigator to organize, summarize and describe measures of a sample. Here no predictions or inferences are made regarding population parameters. Descriptive statistics are used to summarize observations and to place these observations within context. The most common descriptive statistics include measures of central tendency and measures of variability.

Central tendency or “measures of the middle”
There are three commonly used measures of central tendency: the mean, the median, and the mode- are calculated to identify the average, the most typical and the most common values, respectively among the data collected. The mean is the arithmetic average, the median is the point representing the 50th percentile in a distribution, and the mode is the most common score. Sometimes each of these measures is the same; on other occasions, the mean, the median, and the mode can be different. The mean, median, and mode are the same when the distribution of scores is normal. Under most circumstances the mean, median, and mode will not be exactly the same. The mode is most likely to misrepresent the underlying distribution and is rarely used in statistical analysis. The mean and the median are the most commonly reported measures of central tendency.
The major consideration in choosing between them is how much weight should be given to extreme scores. The mean takes into account each score in the distribution; the median finds only the halfway point. As mean best represents all subjects and because of desirable mathematical properties, the mean is typically favored in statistical analysis. Despite the advantages of the mean, there are also some advantages to the median. In particular, the median disregards outlier cases, whereas the mean moves further in the direction of the outliers. Thus, the median is often used when the investigator does not want scores in the extreme of the distribution to have a strong impact. The median is also valuable for summarizing data for a measure that might be insensitive toward the higher ranges of the scale. For instance, a very easy test may have a ceiling effect but does not show the true ability of some test-takers. A ceiling effect occurs when the test is too easy to measure the true ability of the best students. Thus, if some scores stack up at the extreme, the median may be more accurate than the mean. If the high scores had not been bounded by the highest obtainable score, the mean may actually have been higher.
The mean, median, and mode are exactly the same in a normal distribution. However, not all distributions of scores have a normal or bell-shaped appearance. The highest point in a distribution of scores is called the modal peak. A distribution with the modal peak off to one side or the other is described as skewed. The word skew literally means "slanted."
The direction of skew is determined by the location of the tail or flat area of the distribution. Positive skew occurs when the tail goes off to the right of the distribution. Negative skew occurs when the tail or low point is on the left side of the distribution. The mode is the most frequent score in the distribution. In a skewed distribution, the mode remains at the peak whereas the mean and the median shift away from the mode in the direction of the skewness. The mean moves furthest in the direction of the skewness, and the median typically falls between the mean and the mode. Mode is the best measure of central tendency when nominal variables are used. Median is the best measure of central tendency when ordinal variables are used. Mean is the best measure of central tendency when interval or ratio scales are used.

Measures of Variability
If there is no variability within populations there would be no need for statistics: a single item or sampling unit would tell us all that is needed to know about the population as a whole. Three indices are used to measure variation or dispersion among scores: (1) range, (2) variance, and (3) standard deviation (Cozby, 2000). The range describes the deference between the largest and smallest observations made: the variance and standard deviation are based on average difference or deviation of observations from the mean.
Measures of central tendency, such as the mean and median, are used to summarize information. They are important because they provide information about the average score in the distribution. Knowing the average score, however, does not provide all the information required to describe a group of scores. In addition, measures of variability are required. The simplest method of describing variability is the range, which is simply the difference between the highest score and lowest score.
Another statistic, known as the interquartile range, describes the interval of scores bounded by the 25th and 75th percentile ranks; the interquartile range is bounded by the range of scores that represent the middle 50 percent of the distribution. In contrast to ranges, which are used infrequently in statistical analysis, the variance and standard deviation are used commonly. Since the mean is the average score in a distribution, the sum of the deviations around the mean will always equal zero. Yet, in order to understand the characteristic of a distribution of scores, some estimation of deviation around the mean is important. The sum of these deviations will always equal zero. However, the squared deviations around the mean can yield a meaningful index. The variance is the sum of the squared deviations around the mean divided by the number of cases.

Range
Range is the simplest method of examining variation among scores and refers to the difference between the highest and lowest values produced. It shows how wide the distribution is over which the measurements are spread. For continuous variables, the range is the arithmetic difference between the highest and lowest observations in the sample. In the case of counts or measurements, 1 should be added to the difference because the range is inclusive of the extreme observations.. The range takes account of only the most extreme observations. It is therefore limited in its usefulness, because it gives no information about how observations are distributed. Inter quartile range is the area between the lowest quartile and the highest quartile, or the middle 50% of the scores

Variance
The variance is a very useful statistic and is commonly employed in data analysis. However, its calculation requires finding the squared deviations around the mean rather than the simple or absolute deviations around the mean. Thus, when the variance is calculated, the resulting calculation will be in units that are the natural squared units. Taking the square root of the variance puts the observations back into their original metric. The square root of the variance is known as the standard deviation. The standard deviation is an approximation of the average deviation around the mean. Although the standard deviation is not technically equal to the average deviation, it gives an approximation of how much the average score deviates from the mean. One method for calculating variance is to first calculate the deviation scores. The sum of the set of deviation score   equal to zero. Variance is the squire of the standard deviation: conversely, a standard deviation is the squire root of the variance. The deviation of a distribution of scores can then be used to calculate the variance.

Standard Deviation
The standard deviation is the most widely applied measure of variability. When observations have been obtained from every item or sampling unit in a population, the symbol for the standard deviation is (lower case sigma). This is parameter of the population. When it is calculated from a sample it is symbolized s. Standard deviation of a distribution of scores is the squire root of the variance. Large standard deviations suggest that scores do not cluster around the mean: they are probably widely scattered. Similarly small standards deviations suggest that there is very little deference among scores.

Normal Distribution
The normal distribution is a mathematical construct which suggests that naturally occurring observations follow a given pattern. The pattern is the normal curve, which places most observations at the mean and lesser number of observations at either extreme. This curve or bell-shaped distribution reflects the tendency of the observations concerning a specific variable to cluster in a particular manner
The normal curve can be described for any set of data given the mean and standard deviation of the data and assumptions that the characteristics under study would be normally distributed within the population. A normal distribution of the data suggests that 68% of observations fall within one standard deviation of the mean, 95% fall within two standard deviations of the mean, and 99.87% fall within three standard deviations of the mean. Theoretically range of the curve is unlimited.

Standard Scores
One of the problems with means and standard deviations is that their meanings are not independent of context. For example, a mean of 45.6 means little unless the score is known. The Z-score is a transformation into standardized units that provides a context for the interpretation of scores. The Z-score is the difference between the score and the mean, divided by the standard deviation. To make comparisons between groups, standard scores rather than raw scores can be used.  Standard scores enable the investigator to examine the position of a given score by measuring its mean deviation from the means of all sores.
Most often, the units on the x axis of the normal distribution are in Z-units. Any variable transformed into Z-units will have a mean of 0 and a standard deviation of 1. Translation of Z-scores into percentile ranks is accomplished using a table for the standard normal distribution. Certain Z-scores are of particular interest in statistics and psychological testing. The Z-score 1.96 represents the 97.5th percentile in a distribution whereas -1.96 represents the 2.5th percentile. A Z-score of less than -1.96 or greater than +1.96 falls outside of a 95 percent interval bounding the mean of the Z-distribution. Some statistical definitions of abnormality view these defined deviations as cutoff points. Thus, a person who is more than 1.96 Z-scores from the mean on some attribute might be regarded as abnormal. In addition to the interval bounded by 95 percent of the cases, the interval including 99 percent of all cases is also commonly used in statistics.

Confidence Intervals
In most statistical inference problems the sample mean is used to estimate the population mean. Each sample mean is considered to be an unbiased estimate of the population mean. Although the sample mean is unlikely to be exactly the same as the population mean, repeated random samples will form a sampling distribution of sample means. The mean of the sampling distribution is an unbiased estimate of the population mean. However, taking repeated random samples from the population is also difficult and expensive. Instead, it is necessary to estimate the population mean based on a single sample; this is done by creating an interval around the sample mean.
The first step in creating this interval is finding the standard error of the mean. The standard error of the mean is the standard deviation divided by the square root of the sample size. Statistical inference is used to estimate the probability that the population mean will fall within some defined interval. Because sample means are distributed normally around the population mean, the sample mean is most probably near the population value. However, it is possible that the sample mean is an overestimate or an underestimate of the population mean. Using information about the standard error of the mean, it is possible to put a single observation of a mean into context.
The ranges that are likely to capture the population mean are called confidence intervals. Confidence intervals are bounded by confidence limits. The confidence interval is defined as a range of values with a specified probability of including the population mean. A confidence interval is typically associated with a certain probability level. For example, the 95 percent confidence interval has a 95 percent chance of including the population mean. A 99 percent confidence interval is expected to capture the true mean in 99 of each 100 cases. The confidence limits are defined as the values for points that bound the confidence interval. Creating a confidence interval requires a mean, a standard error of the mean, and the Z-value associated with the interval.

Inferential statistics
Inferential statistics are mathematical procedures which help the investigator to predict or infer population parameters from sample measures. This is done by a process of inductive reasoning based on the mathematical theory of probability (Fowler, J., Jarvis, P. & Chevannes M. 2002).

Probability
The idea of probability is basic to inferential statistics. The goal of inferential statistical techniques is same, to determine as precisely as possible the probability of an occurrence. It can be regarded as quantifying the chance that a stated outcome of an event will take place. Probability refers to the likelihood that the differences between groups under study are the result of chance. Probability Theory states, any given event out of all possible outcomes. When any numbers of mutually exclusive sets are given they add up to a singularity. When a coin is tossed it has two out comes, either head or tail, i.e. 0.5 chance for head and 0.5 chance for tail. When these two chances are added it gives 1. For example, in a class there are fifty students, the chance of students to become first in the class is 1 in 50 (i.e. .02). By convention, probability values fall on a scale between 0 (impossibility) and 1 (certainty), but they are sometimes expressed as percentages, so the ‘probability’ scale has much in common with the proportion scale. The chance of committing type one error is decided by testing the hypothesis for its probability value. In behavioural sciences <.05 is taken as alpha value for testing the hypothesis. When stringent outcomes are required <.01 or <.001 are taken as the alpha value or p value.

Statistical Significance (alpha level)
The level of significance (or alpha level) is determined to identify the probability that the deference between the groups have occurred by chance rather than in response to the manipulation of variables. The decision of whether the null hypothesis should be rejected depends on the level of error that can be tolerated. The tolerance level of error is expressed as a level of significance or alpha level.  The usual level of significance or alpha level is 0.05, although at times levels of 0.01 or o.001 may be used when high level of accuracy is required. In testing the significance of obtained statistics, if the investigator rejects the null hypothesis when, in fact, it is true he commits type I error or alpha error, and when the investigator accepts the null hypothesis when, in fact, it is false he commits type II or beta error (Singh AK, 2002).

Parametric and Non-parametric Tests
Parametric and non-parametric test are commonly employed in behavioral researches.

Parametric Tests
A parametric test is one which specifies certain conditions about the parameter of the population from which a sample is taken.  Such statistical tests are considered to be more powerful than non-parametric tests and should be used if their basic requirements or assumptions are met. Assumptions for using parametric tests:
  • The observation must be independent.
  • The observation must be drawn from a normal distribution.
  • The sample drawn from a population must have equal variances and this condition is more important if the size of the sample is particularly small, i.e. homogenicity of variables.
  • The variables must be expressed in interval or ratio scales.
  • The variables under study should be continuous
Examples of parametric tests are t-test, z-test and F-test.

Non-parametric tests
A non-parametric test is one does not specify any conditions about the parameter of the population from which the population is drawn. These tests are called distribution-free statistics. For non-parametric tests, the variables under study should be continuous and the observations should be independent. Requisites for using a non-parametric statistical test are:
  • The shape of the distribution of the population from which a sample is drawn is not known to be normal curve.
  • The variables have been quantified on the basis of nominal measures (or frequency counts)
  • The variables have been quantified on the basis of ordinal measures or ranking.
  • A non-parametric test should be used only when parametric assumptions cannot be met.
Common non-parametric tests
  • Chi-squire test
  • Mann-Whitney U test
  • Rank difference methods (Spearman rho and Kendal’s tau)
  • Coefficient of concordance (W)
  • Median test
  • Kruskal-Wallis test
  • Friedman test
Tips on using appropriate tests in experimental design
Two unmatched (unrelated) groups, experimental and control (e.g. patient receiving a prepared therapeutic intervention for depression and control group of patients on routine care)-
  • See the distribution, whether normal or non-normal
  • If normal, use parametric tests (independent t-test)
  • If non-normal, go for nonparametric tests- Mann-Whitney U test or making the data normal through natural log transformation or z-transformation.
Two-matched (related) groups, pre-post design (the same group is rated before intervention and after the period of intervention the group is again rate. i.e. two ratings in the same or related group)-
  • See distribution, whether normal or non-normal
  • If normal use parametric paired t-test.
  • If non-normal, use nonparametric Wilcoxon Sign Rank (W) test
More than two –unmatched (unrelated) groups (for example three groups: schizophrenia, bipolar and control group)-
  • see distribution whether normal or non-normal
  • if normally distributed use parametric One-way ANOVA
  • if non-normal use nonparametric Kruskal-Wallis test
More than two matched (related) groups (for example in ongoing intervention ratings at different times- t1, t2, t3, t4 …)
  • See distribution, normal or non-normal
  • If the data is normal use parametric Repeated Measures ANOVA
  • If data is non-normal use nonparametric  Friedman’s test
Matched (related) and unmatched (unrelated) observations
When analyzing bivariate data such as correlations, a single sample unit gives a pair of observations representing two different variables. The observations comprising a pair are uniquely linked, are said to be matched or paired. For example, the systolic blood pressure of 10 patients and measurements of another 10 patients after administration are unmatched. However, the measurements of the same 10 patients before and after administration of the drug are matched. It is possible to conduct more sensitive analysis if the observations are matched.

Common Statistical tests
Chi-squire (X2) Test (analyzing frequencies)
The chi-squire test is one of the important non-parametric tests. Guilford (1956) has called it the ‘general-purpose statistic’. Chi-squire test are widely referred to as test of homogenicity, randomness, association, independence and goodness of fit. The chi-squire test is used when the data are expressed in terms of frequencies of proportions or percentages. This test applies only to discrete data, but any continuous data can be reduced to the categories of in such a way that they can be treated as discrete data. The chi-square statistic is used to evaluate the relative frequency or proportion of events in a population that fall into well-defined categories. For each category, there is an expected frequency that is obtained from knowledge of the population or from some other theoretical perspective. There is also an observed frequency for each category. The observed frequency is obtained from observations made by the investigator. The chi-square statistic expresses the discrepancy between the observed and the expected frequency.
There are several uses of chi-squire test as:
1.  Chi-squire test can be used as a test of equal probability hypothesis (equal probability hypothesis is meant the probability of having the frequencies in all the given categories as equal).
2.  Testing the significance of the independence hypothesis (independent hypothesis means that one variable is not affected by or related to another variable and hence, these two variables are independent).
3.  Chi-squire test can be used in testing a hypothesis regarding the normal shape of a frequency distribution (goodness-of-fit).
4. Chi-squire test is used in testing significance of several statistics like phi-coefficient, coefficient of concordance, and coefficient of contingency.
5. In chi-squire test, the frequencies we observe are compared with those we expect on the basis of some null hypothesis. If the discrepancy between the observed and expected frequencies is great, then the value of the calculated test statistic will exceed the critical value at the appropriate number of degree of freedom. Then the null hypothesis is rejected in favor of some alternative. The mastery of the method lies not in so much in the computation of the test statistic itself, but in the calculation of expected frequencies.
6. The chi-squire statistic does not give any information regarding the strength of a relationship: it only conveys the existence of or non-existence of the relationship between the variables investigated. To establish the extent and nature of the relationship, additional statistics such as phi, Cramer’s V, or contingency coefficient can be used (Brockopp &Hastings-Tolsma, 2003).
Tips on analyzing frequencies
  • All versions of the chi-squire test compare the agreement between a set of observed frequencies and those expected if some null hypothesis is true.
  • All objects are counted the nominal scale or unambiguous intervals on a continuous scale like successive days or moths ma be regarded for the application of the tests.
  • Apply Yate’s correction in the chi-squire test when there is only one degree of freedom, i.e. when there is only ‘one way’ test and in 2×2 contingency table.
Testing normality of a data
Parametric statistical techniques depend upon the mathematical properties of the normal curve. They usually assume that samples are drawn from populations that are normally distributed. Before adopting a statistical test, it is essential to determine whether the data is normal or non-normal. The normality of data can be checked by two ways, either plot out the data to see if they look normal or using sophisticated statistical procedures. There are statistical tests to see normality of the data. The commonest one is Kolmogorov-Smirnov test.  As per the central limit theorem, if there is no significance in the P value (> .05) ideally a parametric test can be used for analyzing the data, and if there is significance (<.05) a non-parametric test should be used for analysis. A Shapiro-Wilk test is used to see whether parameters used to test normality is within the allowed limit. Statistical packages like SPSS can be used for doing this test.

t-test and z-test (comparing means)
In experimental sciences, comparisons between groups are very common. Usually, one group is the treatment, or experimental group, while the other group is the untreated, or control group. If patients are randomly assigned to these two groups, it is assumed that they differ only by chance prior to treatment. Differences between groups after the treatment are usually used to estimate treatment effect. The task of the statistician is to determine whether any observed differences between the groups following treatment should be attributed to chance or to the treatment. The t-test is commonly used for this purpose. There are actually several different types of t-tests

Types of t-Tests
  • Comparison of a sample mean with a hypothetical population mean.
  • Comparison between two scores in the same group of individuals.
  • Comparison between observations made on two independent groups.
t-test and z-test are parametric inferential statistical techniques used when comparison of two means are required. It is used to test the null hypothesis that there is no difference in means between the two groups. The reporting of the results of t-test generally includes the df, t-value, and probability level. A t-test can be one-tailed or two-tailed. If the hypothesis is directional, a one-tailed test is generally used, and if the hypothesis is non-directional. t-test is used when sample size is less than 30 and z-test is used when sample size is more than 30.
There are dependent and independent t-tests. The formula to calculate a t-test can differ depending on whether the samples involved are dependent or independent. Samples are independent when there are two groups such as an experimental and a control group. Samples are dependent when the participants from two groups are paired in some manner. The form of the t-test that is used with a dependent sample may be termed as paired, dependent, matched, or correlated (Brockopp & Hastings-Tolsma, 2003). 

Degree of freedom (df)
Degree of freedom (df) is a mathematical concept that describes the number of events or observations that are free to vary: for each statistical test there is a formula for calculating the appropriate degree of freedom (n-1).

Mann-Whitney U-test
The Mann-Whitney U test is a non-parametric substitute for the parametric t-test, for comparing the medians of two unmatched pairs. For application of U test data must be obtained on ordinal or interval scale. We can use Mann-Whitney U-test to compare the median time undertaken to perform the task by a sample of subjects who had not drunk with that of another sample who had drunk a standardized volume of alcohol. This test is used to see group difference, when the data is non-normal and the groups are independent. The test can be applied in groups with unequal or equal size.
Some key points about using Mann-Whitney U-test are:
  • This test can be applied to interval data (measurements), to count of things, derived variable (proportions and indices) and to ordinal data (rank scales, etc.)
  • Unlike some test statistics, the calculated value of U has to be smaller than the tabulated critical value in order to reject null hypothesis.
  • The test is for difference in medians. It is common error to record a statement like ‘the Mann-Whitney U-test showed there is significant difference in means. There is, however, no need to calculate the medians of each sample to do the test.
Wilcoxon test -matched pairs
The Wilcoxon test for matched pairs is a non-parametric test for comparing the medians of two matched samples. It calls for a test statistic T whose probability distribution is known. The observation must be drawn on interval scale. It is not possible to use this test on ordinal measurements. The Wilcoxon's test can be used in matched pair samples. This test is for difference in medians and the test assumes that samples have been drawn from parent populations that are symmetrically not necessarily normally distributed. 

Pearson Product-Moment Correlation Coefficient
The Pearson product-moment correlation method is a parametric test is a common method assessing the association between two variables under study. In this test an estimation of at least one parameter is involved, measurement is at an interval level, and it is assumed that the variable under study is normally distributed within the population.

Spearman Rank correlation Coefficient
Spearman’s r is a nonparametric test, which is equivalent to parametric Pearson r. Spearman’s Rank Correlation Technique is used when the conditions of the Product Moment Correlation Coefficient do no apply. This test is widely used by health scientists and uses ranks of the x and y observations and the raw data themselves are discarded.

Tips on using correlation tests
  • When observations of one or both variables are on an ordinal scale, or are proportions, percentages, indices or counts of things, use the Spearman’s Rank Correlation Coefficient. The number of units in the sample i.e. the number of paired observations should be between 7 and 30.
  • When observations are measured on interval scale use Product Moment Correlation Coefficient should be considered. . Sample units must be obtained randomly, and the data should be bivariate normal i.e. x and y.
  • The relationship between the variables should be rectilinear (straight line) not curved.  Certain mathematical transformations (e.g. logarithmic transformation) will ‘straighten up’ curved relationships.
  • A strong and significant correlation does not mean does not mean one necessarily the cause of the other. It is possible that some additional, unidentified factor is underlying source of variability in both variables.
  • Correlations measured in samples estimate correlations in the populations.  A correlation in a sample is not ‘improved’ or strengthened by obtaining more observations: however, larger samples may be required to confirm the statistical significance of weaker correlations.

Common Statistical Tests
Regression Analysis
Regression analysis is often used to predict the value of one variable given information about another variable. The procedure can describe how two continuous variables are related. Regression analysis is used to examine relationships among continuous variables and is most appropriate for data that can be plotted on a graph. Data are usually plotted, so that the independent variable is seen on the horizontal (x) axis and the dependent variable on the vertical (y) axis. The statistical procedure for regression analysis includes a test for the significance of the relationship between two variables. Given a significant relationship between two variables, knowledge of the value of the independent variable permits a prediction of the value of the dependent variable.

One-Way Analysis of Variance (ANOVA)
When there are three or more samples, and the data from each sample are thought to be distributed normally, analysis of variance (ANOVA) may be a technique of choice One-way analysis of variance is a parametric inferential statistical test that enables the investigators to compare two or more group means, which was developed by RF. Fisher. The reporting of the results includes the df, F value and the probability level. ANOVA is of two types: simple analysis of variance and complex analysis of variance or two-way analysis of variance. One-Way Analysis of Variance (ANOVA) is an extension of t-test, which permits the investigator to compare more than two means simultaneously.
Researchers studying two or more groups can use ANOVA to determine whether there are differences among the groups. For example, nurse investigators who want to assess the levels of helplessness among three groups of patients--long-term, acute care and outpatients-can administer an instrument designed to measure levels of helplessness and then calculate an F ratio. If the F ratio is sufficiently large, then conclusion can be that there is a difference between at least two of the means can be drawn.
The larger the F-ratio, the more likely it is that the null hypothesis can be rejected. Other tests called post hoc comparisons, can be used to determine which of the means differ significantly. Fisher’s LSD, Duncan’s new multiple range test, the Neuman-Keuls, Tukey’s HSD, and Scheffe’s test are the post hoc comparison tests that are most frequently used following ANOVA. In some instances a post hoc comparison is not necessary because the means of the groups under consideration readily convey the differences between the groups (Brockopp & Hastings-Tolsma, 2003).

Kruskal-Wallis test-more than two samples
The Kruskal-Wallis test is a simple non-parametric test to compare the medians of three or more samples. Observations may be interval measurements, counts of things, derived variables, or ordinal ranks. If there are only three samples, then there must be at least five observations in each sample. Samples do not have to be of equal sizes. The statistic K is used to indicate the test value.
Multivariate Analysis

Two-way or Factorial Analysis of Variance
Factorial analysis of variance permits the investigator to analyze the effects of two or more independent variables on the dependent variable (one-way ANOVA is used with one independent variable and one dependent variable). The term factor is interchangeable with independent variable and factorial ANOVA therefore refers to the idea that data having two or more independent variables can be analyzed using this technique.

Analysis of Covariance (ANCOVA)
ANCOVA is an inferential statistical test that enables investigators t adjusts statistically for group differences that may interfere with obtaining results that relate specifically to the effects of the independent variable(s) on the dependent variable(s).

Multivariate Analysis
Multivariate analysis refers to a group of inferential statistical tests that enable the investigator to examine multiple variables simultaneously. Unlike other statistical techniques, these tests permit the investigator to examine several dependent and independent variables simultaneously.

Choosing the appropriate test
If the data fulfill the requirement of parametric assumptions, any of the parametric tests which suit the purpose can be used. O the other hand, if the data do not fulfill the parametric requirements, any of the non-parametric statistical tests, which suit the purpose, can be selected. Other factors which decide the selection of appropriate statistical tests are the number of independent and dependent variables, and he nature of the variables (whether nominal, ordinal, interval or ratio). When both independent and dependent variables are interval measures and are more than one, multiple correlation is the most appropriate statistic.  On the other hand when they are interval measures and their number is only one, Pearson r may be used. With ordinal and nominal measures, the non-parametric statistics are the common choice.

Computer Aided Analysis
The availability of computer software has greatly facilitated the execution of most statistical techniques. The many statistical packages run on different types of platforms or computer configurations. For general data analysis the Statistical Package for the Social Sciences (SPSS), the BMDP series, and the Statistical Analysis System (SAS) are recommended. These are general-purpose statistical packages that perform essentially all the analyses common to biomedical research. In addition, a variety of other packages have emerged.
SYSTAT runs on both IBM-compatible and Macintosh systems and performs most of the analyses commonly used in biomedical research. The popular SAS program has been redeveloped for Macintosh systems and is sold under the name JMP. Other commonly used programs include Stata, which is excellent for the IBM-compatible computers. The developers of Stata release a regular newsletter providing updates, which makes the package very attractive. StatView is a general-purpose program for the Macintosh computer.
Newer versions of StatView include an additional program called Super ANOVA, which is an excellent set of ANOVA routines. StatView is user-friendly and also has superb graphics. For users interested in epidemiological analyses, Epilog is a relatively low-cost program that runs on the IBM-compatible platforms. It is particularly valuable for rate calculations, analysis of disease-clustering patterns, and survival analysis. GB-STAT, is a low-cost, multipurpose package that is very comprehensive.
SPSS (Statistical Package for Social Sciences) is one among the popular computer programs for data analysis. This software provides a comprehensive set of flexible tools that can be used to accomplish a wide variety of data analysis tasks (Einspruch, 1998). SPSS is available in a variety of platforms. The latest product information and free tutorial are available at www.spss.com.
Computer software programs that provide easy access to highly sophisticated statistical methodologies represent both opportunities and dangers. On the positive side, no serious researcher need be concerned about being unable to utilize precisely the statistical technique that best suits his or her purpose, and to do so with the kind of speed and economy that was inconceivable just two decades ago. The danger is that some investigators may be tempted to employ after-the-fact statistical manipulations to salvage a study that was flawed to start with, or to extract significant findings through use of progressively more sophisticated multivariate techniques.

References & Bibliography
  1. Ahuja R (2001). Research Methods. Rawat Publications, New Delhi. 71-72.
  2. Brockopp D Y & Hastings-Tolsma M (2003). Fundamental of Nursing Research. 3rd Edition. Jones and Bartlet: Boston
  3. Cozby P C (2000). Methods in Behavioral Research (7th Edition). Toronto: Mayfield Publishing Co.
  4. Kerr A W, Hall H K, Kozub S A (2002). Doing Statistics with SPSS. Sage Publications, London.
  5. Einspruch E L (1998). An Introductory Guide to SPSS for Windows. Sage Publications, Calf.
  6. Fowler J, Jarvis P & Chevannes M (2002). Practical Statistics for Nursing and Health Care. John Wiley & Sons: England
  7. Guilliford, J P (1956). Fundamental Statistics in Psychology and Education. New York: McGraw-Hill Book Co.
  8. Lindquist, E F. (1968). Statistical Analysis in Educational Research.  New Delhi: Oxford and IBH Publishing Co.
  9.  Singh AK. (2002). Tests, Measurements and Research Methods in Behavioural Sciences. Bharahty Bhavan. New Delhi.
  10. Singlton, Royce A. and Straits, Bruce (1999). Approaches to Social Research (3rd Ed), Oxford University Press, New York.
  11.  Streiner, D. & Norman, G. (1996). PDQ Epidemiology (2nd Edition). St. Louis: Mosbey
  12.  Therese Baker L (1988). Doing Social Research, McGraw Hill Book Co., New York.
  13. Treece E W & Treece J H (1989). Elements of Research in Nursing, The C.V. Mosby Co.,St.Louis.
  14. Tyler L E (1963).Tests and Measurements. Englewood Cliffs, New Jersey: Prentice Hall, a-p7.b-p.14
  15. Chalmers TC, Celano P, Sacks H, Smith H(1983). Bias in treatment assignment in controlled clinical trials. N Engl J Med 309:1358.
  16. Cohen J (1988). Statistical Power Analysis for the Behavioral Sciences. Erlbaum, Hillsdale, NJ.
  17. .Cook TD, Campbell DG(1979). Quasi-experimentation: Design and Analysis Issues for Field Studies. Rand-McNally, Chicago.
  18. Daniel WW (1995) Biostatistics: A Foundation for Analysis in the Health Sciences, ed 6. Wiley, New York.
  19. Daniel WW (1990). Applied Nonparametric Statistics, ed 2. PWS-Kent, Boston.
  20. Dawson-Saunders B, Trapp RG (1994) Basic and Clinical Biostatistics, ed 2. Appleton & Lange, Norwalk, CT.
  21. Edwards LK, editor (1993) Applied Analysis of Variance in Behavioral Science. Marcel Dekker, New York.
  22. Efron B, Tibshirani R (1991). Statistical data analysis in the computer age. Science 253:390.
  23. Jaccard J, Becker MA (1997). Statistics for the Behavioral Sciences, ed 3. Brooks/Cole Publishing Co, Pacific Grove, CA.
  24. Keppel G (1991). Design and Analysis. Prentice-Hall, Englewood Cliffs, NJ.
  25. Kaplan RM, Grant I, (200). Statistics and Experimental Design in Kaplan & Sadock's Comprehensive Textbook of Psychiatry 7th Edition.
  26. McCall R (1994). Fundamental Statistics for Psychology, ed 6. Harcourt Brace, & Jovanovich, New York.
  27. Pett MA (1997). Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. Sage Publications, Thousand Oaks, CA.
  28. Sacks H, Chalmers DC, Smith H (1982). Randomized versus historical controls for clinical trials. Am J Med 72:233.
  29. Ware ME, Brewer CL, editors (1999). Handbook for Teaching Statistics and Research Methods, ed 2. Erlbaum, Mahwah, NJ.

0 comments:

Post a Comment