Chi-square always sig. for large sample sizes (500+), hence type II error. Can look at RMSEA.
"RMSEA values of < 0.2 with sample sizes of 500+, and certainly 1000+, may indicate that the data do not underfit the model, and that the chi-square was inflated by sample size." - http://www.rasch.org/rmt/rmt254d.htm
MacCallum, Browne and Sugawara (1996) have used 0.01, 0.05, and 0.08 to indicate excellent, good, and mediocre fit, respectively. However, others have suggested 0.10 as the cutoff for poor fitting models. These are definitions for the population. That is, a given model may have a population value of 0.05 (which would not be known), but in the sample it might be greater than 0.10. - http://davidakenny.net/cm/fit.htm
Fit index ranges for structural equation modeling - http://www.psych-it.com.au/Psychlopedia/article.asp?id=277
Assess whole SEM - chi sq and fit index - http://zencaroline.blogspot.hk/2007/04/global-model-fit.html
- chi-sq p>.05, acceptable model fit
- penalty of model complexity (?)
Chi sq test
- H0: the proposed model holds in the population (sample covariance matrix = population covariance matrix) --> do NOT want to reject the H0!
-- the bigger the p the better
Reports of chi squre should be accompanied by degrees of freedom, sample size, and p-value. Example, χ2 (48, N=500)= 303.80, p < .001, TLI=.86, CFI=.90; or χ2 (15, N=2232)=10.91,p=.77Chi square is highly sensitive to departures from multivariate normality.
The χ2 associated with the model # is significant, χ2 (df, N=2232)=#, p=0.000, which suggests that the model is not consistent with the observed data.
Nonsignificant— χ2 (15, N=2232)=10.91,p=.77, suggesting that the proposed model is consistent with the observed data
- χ2 is sensitive to sample size. With large sample size, the chi-square values will be inflated (statistically significant), thus might erroneously implying a poor data-to-model fit (Schumacker & Lomax, 2004).
- Small sample sizes - not enough power to detect the differences between several competing models using the chi squre statistic for model selection or evaluation.
- Larger sample sizes - power may be so high --> poor model fit --> rejected models with only trivial misspecifications.
- Large, complex problems (many variables and df), the observed chisquare will nearly always be statistically significant, even when there is a reasonably good fit to the data.
- Relative chi-square (or normal chi-square) = chi-square fit index / df
- Make chi-sq test less dependent on sample size.
- Wheaton (1987) advocated CMIN/DF not be used.
- In the range of 2 to 1 or 3 to 1 indicate acceptable fit between the hypothetical model and the sample data (Carmnines&McIver,1981).
- Different researchers have recommended using ratio as low as 2 or as high as 5 to indicate a reasonable fit (Marsh&Hocevar,1985).
- A chi-squre/df ratio larger than 2 indicates an inadequate fit (Byrne,1989). chi-square/df ratio values lower than 2 are widely considered to represent a minimally plausible model (Byrne,1991, The Maslach Burnout Inventory: validating factorial structure and invariance across intermediates, secondary, and university educators. Multivariate Behavioral Research, 26 (4), 583-605)
- The smaller the
Chi-square, the better the fit of the model. It has been suggested that a
Chi-square two or three times as large as the degrees of freedom is
acceptable (Carmines
& McIver, 1981), but the fit is considered better the closer the Chi-square value
is to the degrees of freedom for a model (Thacker, Fields & Tetrick, 1989). In
the present sample, it was suggested that a ratio of 5 to 1 was “a useful rule
of thumb” (Jackson et al., 1993, p. 755). -- cf Timothy R. Hinkin (1995)
A Review of Scale Development Practices in the Study of Organizations.
Journal of Management, Vol. 21, No. 5.967-988 - However, Chi-square test may be misleading. 1) The more complex the model, the more likely a good fit (i.e., the closer the researcher's model is to being just-identified, the more likely good fit will be found). 2) The larger the sample size, the more likely the rejection of the model and the more likely a Type II error (rejecting something true). In very large samples, even tiny differences between the observed model and the perfect-fit model may be found significant. 3) The chi-square fit index is also very sensitive to violations of the assumption of multivariate normality. When this assumption is known to be violated, the researcher may prefer Satorra-Bentler scaled chi-square, which adjusts model chi-square for non-normality.
- To address the limitations of chi-squre test, goodness-of-fit indexes as adjuncts to the chi-squre statistic are used to assess model fit
- Model with many variables and small samples may be more inclined to experience degradation in absolute fit indexes than models with many variables and large sample size.
- RMR(root mean square residual), the smaller the RMR, the better the model. An RMR of zero indicates a perfect fit. The closer the RMR to 0 for a model being tested, the better the model fit. RMR smaller than 0.05 indicates good fit.
- SRMR (standardized RMR, root mean square residual)-- SRMR < = .05 means good fit, The smaller the SRMR, the better the model fit. SRMR = 0 indicates perfect fit. A value less than .08 is considered good fit. SRMR tends to be lower simply due to larger sample size or more parameters in the model. To get SRMR in AMOS, select Analyze, Calculate Estimates as usual. Then Select Plugins, Standardized RMR: this brings up a blank Standardized RMR dialog. Then re-select Analyze, Calculate Estimates, and the Standardized RMR dialog will display SRMR.
- GFI should by equal to or greater than .90 to indicate good fit. GFI is less than or equal to 1. A value of 1 indicates a perfect fit. GFI tends to be larger as sample size increases. GFI> 0.95 indicates good fit. GFI index is roughly analogous to the multiple R square in multiple regression in that it represents the overall amount of the covariation among the observed variables that can be accounted for by the hypothesized model.
- AGFI (adjusted GFI), AGFI adjusts the GFI for degree of freedom, resulting in lower values for models with more parameters. AGFI should also be at least .90, close to 1 indicates good fit. AGFI may underestimate fit for small sample sizes. AGFI's use has been declining and it is no longer considered a preferred measure of goodness of fit. AGFI > 0.9 indicates good fit.
- CI (centrality index)--CI should be .90 or higher to accept the model.
- CAK
- CK (single sample cross-validation index)
- MCI (centrality index
- CN
Baseline Comparisons-- comparing the given model with an alternative model
- CFI (comparative fix index), close to 1 indicates a very good fit, > 0.9 or close to 0.95 indicates good fit, by convention, CFI should be equal to or greater than .90 to accept the model. CFI is independent of sample size. CFI is more appropriate than NFI in finite samples. NFI behaves erratically across ML and GLS, wheresas CFI behaved consistenly across the two estimation methods. CFI is recommended for routine use. Gerbing and Anderson (1993) recommended RNI and CFI, DELTA2 (IFI). When the sample size is small, both the CFI and TLI decrease as we increase the number of vairables in the models.
- RNI, RNI is recommended for routine use. RNI is generally preferred over TLI. RNI> 0.95 indicates good fit.
- BBI (Bentler-Bonett index), should be greater than .9 to consider fit good.
- IFI (incremental fit index,also known as DELTA2), IFI should be equal to or greater than .90 to accept the model. IFI value close to 1 indicates good fit. IFI can be greater than 1.0 under certain circumstances. IFI is not recommended for routine use.
- NFI (normed
fit index, also known as the Bentler-Bonett normed fit index,DELTA1), 1
= perfect fit. NFI values above .95 are good, between .90 and .95
acceptable, and below .90 indicates a need to respecify the model. NFI
greater than or equal to 0.9 indicates acceptable model fit. NFI less
than 0.9 can usually be improved substantially. Some authors have used
the more liberal cutoff of .80. NFI
may underestimate fit for small samples. NFI does not reflect parsimony:
the more parameters in the model, the larger the NFI coefficient, which
is why NNFI (TLI) below is now preferred (NNFI incorporates a
correction for model complexity, whereas the NFI does not). NFI depends
on sample size, values of the NFI will be higher for larger sample
sizes. NFI behaves erratically across estimation methods under
conditions of small sample size. NFI is not a good indicator for
evaluating model fit when the sample size is small.
NFI suggested relatively poorer model fit as missing data increased, with the bias generally more pronounced when data were MAR than when they were MCAR. Whereas NFI is still widely used, it is typically not among the recommended indices in recent reviews. Marsh et al., (1988) recommended against using NFI and in favor of TLI, because NFI, not TLI, is sensitive to sample size. When the sample size is small, both the CFI and TLI decrease as we increase the number of variables in the model. - NNFI(non-normed fit index,also called the Bentler-Bonett non-normed fit index, the Tucker-Lewis index, TLI,RHO2), NNFI is similar to NFI, but penalizes for model complexity. NNFI is not guaranteed to vary from 0 to 1. It is one of the fit indexes less affected by sample size. NNFI close to 1 indicates a good fit. TLI greater than or equal to 0.9 indicates acceptable model fit. By convention, NNFI values below .90 indicate a need to respecify the model. TLI less than 0.9 can usually be improved substantially. Some authors have used the more liberal cutoff of .80 since TLI tends to run lower than GFI. However, more recently, Hu and Bentler (1999) have suggested NNFI >= .95 as the cutoff for a good model fit. TLI is not associated with sample size. NNFI is recommended for routine use. NNFI is a more useful index than NFI. Hu and Bentler (1998,1999) support the continued use of TLI because TLI is relatively insensitive to sample size; TLI is sensitive to model missipecifications; is relatively insensitive to violations of assumptions of multivariate normality; is relatively insensitive to estimation method (maximum likelihood vs alternaitve methods). RNI is generally preferred over TLI.
- NTLI, NTLI is recommended for routine use.
- RFI (relative fit index, RHO1) is not guaranteed to vary from 0 to 1. RFI close to 1 indicates a good fit. Neither the NFI nor the RFI are recommended for routine use.
- PRATIO (parsimony ratio)
- RMSEA (root mean square error of approximation),there is good model fit if RMSEA less than or equal to .05. There is adequate fit if RMSEA is less than or equal to .08. More recently, Hu and Bentler (1999) have suggested RMSEA <= .06 as the cutoff for a good model fit. RMSEA is a popular measure of fit. Less than .05 indicates good fit, =0.0 indicates exact fit, from .08 to .10 indicates mediocre fit, greater than .10 indicates poor fit. RMSEA is judged by a value of .05 or less as an indication of a good fit. A value of .08 or less is indicative of a “reasonable” error of approximation such that a model should not be used if it has an RMSEA greater than .1. Hu and Bentler (1995) suggested values below .06 indicate good fit. The RMSEA values are classified into four categories: close fit (.00–.05), fair fit (.05–.08), mediocre fit (.08–.10), and poor fit (over .10). RMSEA smaller than 0.05 indicates good fit. RMSEA tends to improve as we add variables to the model, expecially with larger sample size. One limitation of RMSEA is that it ignores the complexity of the model. The lack of fit of the hypothesized model to the population is known as the error of approximation. The RMSEA is a standardized measure of error of approximation. RMSEA value of .05 or less indicates a close approximation, values of up to .08 suggests a reasonable fit of the model in the population.
- PCLOSE tests the null hypothesis that RMSEA is no greater than .05. If PCLOSE is less than .05, we reject the null hypothesis and conclude that the computed RMSEA is greater than .05, indicating lack of a close fit.
- PGFI (parsimony goodness of fit index)
- PNFI (parsimony normed fit index),There is no commonly agreed-upon cutoff value for an acceptable model.
- PCFI (parsimony comparative fit index),There is no commonly agreed-upon cutoff value for an acceptable model.
Information criteriosn index, goodness of fit measures based on information theory (do not have cutoffs like .90 or .95. Rather they are used in comparing models, with the lower value representing the better fit.)
- CAK
- CK
- MCI (McDonald's centrality index)
- CN(Hoelter's ctritical N)
- AIC (Akaike Information Criterion, single sample cross-validation index), the lower the AIC measure, the better the fit.
- AIC0, AMOS Specification Search tool by default rescales AIC so when comparing models, the lowest AIC coefficient is 0. For the remaining models, AIC0 <= 2, no credible evidence the model should be ruled out; 2 - 4, weak evidence the model should be ruled out; 4 - 7, definite evidence; 7 - 10 strong evidence; > 10, very strong evidence the model should be ruled out.
- CAIC (Consistent AIC),the lower the CAIC measure, the better the fit.
- BCC (Browne-Cudeck criterion, also called the Cudeck & Browne single sample cross-validation index) It should be close to .9 to consider fit good. BCC penalizes for model complexity (lack of parsimony) more than AIC.
- ECVI (Expected cross-validation index, single sample cross-validation index), in its usual variant is equivalent to BCC, and is useful for comparing non-nested models, lower ECVI is better fit. EVIC can be used to compare non-nested models and allows the determination of which model will cross-validate best in anohter sample of the same size and simliarly selected. Choose the model that has the lowest ECVI.
- MECVI,a variant on BCC,except for a scale factor, MECVI is identical to BCC
- BIC (Bayesian Information Criterion, also known as Akaike's Bayesian Information Criterion (ABIC) and the Schwarz Bayesian Criterion (SBC).compared to AIC, BCC, or CAIC, BIC more strongly favors parsimonious models with fewer parameters. BIC is recommended when sample size is large or the number of parameters in the model is small. Recently, however, the limitations of BIC have been highlighted.
- BIC0,the AMOS Specification Search tool by default rescales BIC so when comparing models, the lowest BIC coefficient is 0. For the remaining models, the Raftery (1995) interpretation is: BIC0 <= 2, weak evidence the model should be ruled out; 2 - 4, positive evidence the movel should be ruled out; 6 - 10, strong evidence; > 10, very strong evidence the model should be ruled out.
- BICp. BIC can be rescaled so Akaike weights/Bayes factors sum to 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICp values represent estimated posterior probabilities if the models have equal prior probabilities. Thus if BICp = .60 for a model, it is the correct model with a probability of 60%. The sum of BICp values for all models will sum to 100%, meaning 100% probability the correct model is one of them, a trivial result but one which points out the underlying assumption that proper specification of the model is one of the default models in the set. Put another way, "correct model" in this context means "most correct of the alternatives."
- BICL. BIC can be rescaled so Akaike weights/Bayes factors have a maximum of 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICL values of .05 or greater in magnitude may be considered the most probable models in "Occam's window," a model-filtering criterion advanced by Madigan and Raftery (1994).
- Quantile or Q-Plots
- IES (Interaction effect size),IES is a measure of the magnitude of an interaction effect (the effect of adding an interaction term to the model). In OLS regression this would be the incremental change in R-squared from adding the interaction term to the equation. In SEM, IES is an analogous criterion based on chi-square goodness of fit. Recall that the smaller the chi-square, the better the model fit. IES is the percent chi-square is reduced (toward better fit) by adding the interaction variable to the model.
- residual is the difference between the sample matrix (S) and population matrix ( ∑ ). Standardized residuals are residuals that have been standardized to have a mean of zero and a standard deviation of one, making them easier to interpret. Standardized residuals larger than absolute value 2.0 are considered to be suggestive of a lack of fit.
- http://www2.chass.ncsu.edu/garson/pa765/structur.htm#extract
- Although AMOS provides a considerably broader range of fit indexes, it does not permit access to the information matrix or residuals with incomplete data.
No comments:
Post a Comment