Testing to Find Relationships Among Many Variables
Both multiple regression and logistic regression testing are used to evaluate the relative predictive contribution of each of several independent variables on a dependent variable. When the researcher, using common sense and evidence from the literature, selects a narrow set of independent variables that she or he believes are important or useful in predicting an outcome (dependent variable), it is said that a predictive model is being created to explain the phenomena being studied.
Preparation You are encouraged to review the multiple and logistic regression materials from previous weeks. Then, review How to Choose a Statistical Test and the test-selection tutorials linked in the Resources to determine which test is most likely to be appropriate for your data type.
Instructions Using the Framingham study data set, perform and interpret statistical tests that answer the following research questions. Then, provide a written analysis of your results.
Demonstrate how baseline BMI, age, and smoking status (variables: bmi1, age1,cursmoke1) can be used to predict baseline glucose (variable: glucose1). How do baseline glucose, cholesterol, systolic blood pressure, and BMI (variables: glucose1, totchol1, sysbp1, and bmi1) affect the likelihood that a participant will have coronary heart disease by the time of the third examination (variable: prevchd3)?
Perform the appropriate statistical tests (based on the assumption test). Provide your rationale for test selection. Interpret the results of your statistical tests for each research question. Consider associated caveats and limitations. Explain how either multiple or logistic regression statistical techniques might be used to understand a complex system in public health. Provide a 1–2-paragraph explanation, with 1–2 supporting references. Write clearly and concisely, using correct grammar, mechanics, and APA formatting. Write for an academic audience, using appropriate statistical terminology, style, and form. Express your main points and conclusions coherently. Proofread your writing to minimize errors that could distract readers and make it more difficult for them to focus on the substance of your statistical analysis
MLR will determine the proportion of variance in $\text{glucose1}$ explained by the combined set of predictors ($\text{R}^2$) and the unique predictive contribution (standardized $\text{beta}$ coefficient) of each variable while controlling for the others.
Interpretation Framework (Hypothetical Results)
Hypothesis: Baseline BMI, age, and smoking status significantly predict baseline glucose levels.
MLR Output Component
Interpretation
Model Summary ($\text{R}^2$)
The $\text{R}^2$ indicates the percentage of total variability in $\text{glucose1}$ explained by the model. (e.g., If $\text{R}^2 = 0.35$, $35\%$ of the variation in glucose is explained by the predictors).
ANOVA F-test
A statistically significant F-statistic ($p < 0.05$) indicates the overall model is a statistically better predictor of $\text{glucose1}$ than using the mean $\text{glucose1}$ alone.
Coefficients ($\beta$ and $p$-value)
$\text{Age1}$ ($\beta$): A positive, significant $\beta$ suggests that for every one-year increase in age, $\text{glucose1}$ is predicted to increase by $\beta$ units, holding $\text{bmi1}$ and $\text{cursmoke1}$ constant. $\text{CurSmoke1}$ ($\beta$): A significant $\beta$ indicates the mean difference in $\text{glucose1}$ between smokers ($\text{cursmoke1}=1$) and non-smokers ($\text{cursmoke1}=0$), controlling for continuous variables.
Caveats and Limitations
Assumptions: MLR relies on key assumptions, including linearity (relationship between predictors and outcome is straight-line), homoscedasticity (variance of residuals is equal across predicted values), and normally distributed residuals. Violation of these assumptions (which is common with health variables like glucose) may require data transformation or use of non-parametric methods.
Multicollinearity: If $\text{bmi1}$ and $\text{age1}$ are highly correlated, it may inflate the standard errors of their coefficients, making it difficult to determine their unique contributions.
Sample Answer
Statistical Analysis Plan and Interpretation Framework
Based on the research questions and the nature of the variables, two distinct types of regression analysis are required.
Research Question 1: Predicting Baseline Glucose
Question: Demonstrate how baseline BMI, age, and smoking status ($\text{bmi1}$, $\text{age1}$, $\text{cursmoke1}$) can be used to predict baseline glucose ($\text{glucose1}$).
Rationale for Test Selection
The dependent variable, baseline glucose ($\text{glucose1}$), is continuous/scale (measured in $\text{mg/dL}$). The independent variables are a mix of continuous ($\text{bmi1}$, $\text{age1}$) and dichotomous ($\text{cursmoke1}$). When the outcome variable is continuous and there are multiple predictors, the appropriate statistical test is Multiple Linear Regression (MLR).
Unlock Your Academic Potential with Our Expert Writers
Embark on a journey of academic success with Legit Writing. Trust us with your first paper and experience the difference of working with world-class writers. Spend less time on essays and more time achieving your goals.