Objectives
This assessment addresses Unit Learning Outcomes 1, 2, 3 & 4:
- Explain how statistical choices in analysis link directly to the research study design that generated the data, and the type of data,
- Explain the rationale behind hypothesis testing, and the concept of Type I and II errors,
- Differentiate the most appropriate descriptive and inferential statistics to use for common types of health data,
- Analyse health data using a statistical software package, and interpret the results.
Task
Your task is to answer a set of research questions using data from the Framingham Heart Study (sent by e-mail). This data was obtained at a particular point in time (cross-sectional) from adult volunteers recruited from the community and contains a variety of clinical data. Details of the data set and the variables are provided below. Additional information about the Framingham Heart Study, including some published results, can be found in Learning Resources.
Description of Variables
Variable
|
Description
|
|
Values
|
|
|
|
|
|
|
ID
|
Unique identification number for each
|
|
|
|
participant
|
2448 – 9999312
|
|
|
|
|
|
|
|
|
SEX
|
Participant sex
|
1
|
= Male; 2 = Female; 999 = Missing
|
|
|
|
|
|
|
AGE
|
Age at exam (years)
|
32 – 81; 999 = Missing
|
|
|
|
|
|
|
|
|
|
SYSBP
|
Systolic Blood Pressure (mean of last two
|
|
|
|
of three measurements) (mmHg)
|
83.5 – 295; 999 = Missing
|
|
|
|
|
|
|
|
|
DIABP
|
Diastolic Blood Pressure (mean of last
|
|
|
|
two of three measurements) (mmHg)
|
30 – 150; 999 = Missing
|
|
|
|
|
|
|
|
|
CIGPDAY
|
Number of cigarettes smoked each day
|
0
|
– 90; 999 = Missing
|
|
|
|
|
|
|
|
|
0
|
= zero to 5years;
|
|
|
|
1
|
= 6_to_11years;
|
|
EDU
|
Attained education level
|
2
|
= High_school_dipl;
|
|
|
|
3
|
= Some_college_or_TAFE;
|
|
|
|
4
|
= College_degree_or_more;
|
|
|
|
999 = Missing
|
|
TOTCHOL
|
Serum Total Cholesterol (mg/dL)
|
107 – 696; 999 = Missing
|
|
|
|
|
|
|
|
|
|
HDLC
|
High Density Lipoprotein Cholesterol
|
10 - 189; 999 = Missing
|
|
(mg/dL)
|
|
|
|
|
|
|
|
|
|
|
LDLC
|
Low Density Lipoprotein Cholesterol
|
20 - 565; 999 = Missing
|
|
(mg/dL)
|
|
|
|
|
|
|
|
|
|
|
BMI
|
Body Mass Index, weight in
|
14.43 - 56.8;
|
|
kilograms/height meters squared
|
|
|
999 = Missing
|
|
|
|
|
|
|
|
|
|
GLUCOSE
|
Casual serum glucose (mg/dL)
|
39 - 478; 999 = Missing
|
|
|
|
|
|
|
|
|
|
|
Diabetic according to criteria of first
|
0
|
= No; 1 = Yes;
|
|
DIABETES
|
exam treated or first exam with casual
|
999 = Missing
|
|
|
glucose of 200 mg/dL or more
|
|
|
|
|
|
|
HEARTRTE
|
Heart rate (Ventricular rate) in
|
37 - 220; 999 = Missing
|
|
beats/min
|
|
|
|
|
|
|
|
|
ANGINA
|
History of Angina Pectoris at exam
|
0
|
= No; 1 = Yes; 999 = Missing
|
|
|
|
|
|
|
|
History of Coronary Heart Disease
|
0
|
= No;
|
|
|
defined as pre-existing Angina Pectoris,
|
|
|
|
ANYCHD
|
Myocardial Infarction (hospitalized,
|
1
|
= Yes;
|
|
|
silent or unrecognized), or Coronary
|
999 = Missing
|
|
|
Insufficiency (unstable angina)
|
|
|
|
|
|
|
STROKE
|
History of stroke
|
0
|
= No; 1 = Yes; 999 = Missing
|
|
|
|
|
|
|
|
History of hypertension. Subject was
|
0
|
= No;
|
|
|
defined as hypertensive if treated or if
|
|
|
|
HYPERTEN
|
second exam at which mean systolic was
|
1
|
= Yes;
|
|
|
>=140 mmHg or mean Diastolic >=90
|
999 = Missing
|
|
|
mmHg
|
|
|
|
|
|
|
A cleaned (error free) subset of the original dataset (which is unique to you) has been sent by e-mail and you will need to use this version for the Assignment. Note: In SPPS you will need to specify which value(s) represent missing values and also the measure type for each variable (e.g scale, nominal, etc.). These are the only changes/edits you need to make as value labels are already defined in the dataset.
In your analytical report you are required to answer the following research questions:
Answer all of the questions (Q1, 2, 3, 4 & 5):
Q1: Is there an association between smoking (Yes/No) and hypertension? (Note: you will need to create a new variable called `SMOKER’ which will contain two groups (`Yes’ or `No’) using information on the number of cigarettes smoked per day). In the new variable name include the number of the dataset you have been assigned e.g if you have been assigned the dataset `Framingham_42.sav’, name the variable `SMOKER_42’.
Q2: Are there differences in systolic blood pressure for people who are underweight, normal, overweight or obese? (Note: You will need to create a new variable called `BMI_4grps’ using the existing variable `BMI’.
Define: BMI < 20.0 = `underweight’; 20.0 ≤ BMI < 25.0 = `normal’; 25.0 ≤ BMI < 30.0 = `overweight’; BMI ≥ 30.0 = `obese’.). In the new variable name include the number of the dataset you have been assigned e.g if you have been assigned the dataset `framingham_42.sav’, name the variable `BMI_4grps_42’.
Q3: Is there a difference in Serum total cholesterol between male and female?
Q4: Is there an association between a participant’s age and their heart rate?
Q5: Are age, low density lipoprotein cholesterol, Serum total cholesterol, casual serum glucose level and body mass index significant predictors of a person’s systolic blood pressure? Which of the variables explain the largest amount of variation in systolic blood pressure?
For each research question (Q1 to 5) you are required to fully detail an analytical plan, similar to that used in the PUN105 Activity Workbook, Week 6 (page 35).
Please use the marking guide on pages 6 and 7 to guide the extent of the analysis and answers presented for each question.
This should include, at a minimum, the following:
- State the question
- Develop and clearly articulate an analysis plan that will allow you to answer the question
- Implement the analysis plan using SPSS and report all relevant output. If you need to modify or create new variables to implement the plan then you should describe these new / modified variables and how they were calculated.
- Interpret the results of the analysis
- Write a summary paragraph describing the question, the data and the results. Graphics should be incorporated if relevant.
- Tables and figures in the report should be professionally presented with clear numbering, titles and appropriate referencing in the written sections of the report. e.g Table 1.1 shows the results from a Chi-square test examining the association between
- The original or raw numerical output from SPSS used to present the results should be presented in an Appendix to the report with clear sections indicating the question and results the output refers to. Failure to provide the original (raw) output from SPSS in an Appendix will result in the assignment being returned to you.
Formatting and Word Limits
Your report should contain a title page clearly identifying the unit code, your name and student number. You should also indicate the word count for each section of your report as outlined below and the file name and number of the datasetyou used. The number of your dataset also needs to be included in any new variables you create (if needed in the question). Failure to do so will result in the assignment not being marked.
Each research question should be treated as a separate section in your report and it is expected that you will use appropriate headings within each section.
You are not required to provide a formal introduction, or search any literature or provide references in your analytical report.
The report must:
- have minimum of 1.5 line spacing, and
- have page margins no smaller than 2cm
It is expected the report will be well written using professional language and be free from grammatical and spelling errors. The written sections of the report should be no longer than 3,000 words excluding the analysis plans and SPSS output. The word count (excluding the analysis plan and SPSS output) for each section should be stated on the title page of the report.
Marking Criteria
The analytical report will be marked out of a total of 355 marks according to the criteria on pages 6 & 7 (last 2 pages). Please ensure you review the criteria prior to submitting your assessment.
Marking Criteria
|
Element
|
Max. marks
|
|
|
Question 1
|
(60)
|
|
|
|
• Clear & comprehensive analytical plan to answer question that is technically
|
10
|
|
|
|
correct including scientific hypothesis, statistical test & assumptions
|
10
|
|
|
|
• Clear description of new variable created and process used
|
|
|
|
• All SPSS output included and matches the analytical plan
|
10
|
|
|
|
• Clearly documented evidence that all test assumptions have been tested for
|
10
|
|
|
|
validity (& revise of analysis if it required)
|
|
|
|
|
|
• Concise & accurate written summary describing the data
|
10
|
|
|
|
• Comprehensive and correct interpretation and reporting of statistical results
|
10
|
|
|
|
Question 2
|
(75)
|
|
|
|
• Clear & comprehensive analytical plan to answer question that is technically
|
10
|
|
|
|
correct including scientific hypothesis, statistical test & assumptions
|
10
|
|
|
|
• Clear description of new variable created and process used
|
|
|
|
• All SPSS output included to match the analytical plan
|
10
|
|
|
|
• Clearly documented evidence that all test assumptions have been tested for
|
15
|
|
|
|
validity (& revise of analysis if it required)
|
10
|
|
|
|
• Concise & accurate written summary describing the data
|
|
|
|
• Comprehensive and correct interpretation and reporting of statistical results
|
20
|
|
|
|
Question 3
|
(50)
|
|
|
|
• Clear & comprehensive analytical plan to answer question that is technically
|
10
|
|
|
|
correct including scientific hypothesis, statistical test & assumptions
|
10
|
|
|
|
• All SPSS output included to match the analytical plan
|
|
|
|
• Clearly documented evidence that all test assumptions have been tested for
|
10
|
|
|
|
validity (& revise of analysis if it required)
|
|
|
|
|
|
• Concise & accurate written summary describing the data
|
10
|
|
|
|
• Comprehensive and correct interpretation and reporting of statistical results
|
10
|
|
|
|
|
|
|
|
|
|
Question 4
|
(50)
|
|
|
• Clear & comprehensive analytical plan to answer question that is technically correct
|
10
|
|
|
|
including scientific hypothesis, statistical test & assumptions
|
|
|
|
|
|
• All SPSS output included to match the analytical plan
|
10
|
|
|
|
• Clearly documented evidence that all test assumptions have been tested for validity
|
10
|
|
|
|
(& revision of analysis if required)
|
|
|
|
|
|
• Concise & accurate written summary describing the data
|
10
|
|
|
|
• Comprehensive and correct interpretation and reporting of statistical results
|
|
|
|
|
|
Question 5
|
|
(90)
|
|
|
|
• Clear & comprehensive analytical plan to answer question that is technically
|
20
|
|
|
|
correct including scientific hypothesis, statistical test & assumptions
|
|
|
|
|
|
• Clear description of univariate & bivariate analysis undertaken
|
20
|
|
|
|
• Clearly documented evidence that all test assumptions have been tested for
|
25
|
|
|
|
validity, and test all relevant correlations, describe significant single linear
|
|
|
|
|
|
relationships with regression and multiple regression.
|
|
|
|
|
|
• All SPSS output included to match the analytical plan
|
10
|
|
|
|
• Concise & accurate written summary describing the data
|
15
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Overall Report
|
|
|
|
|
|
• Written report contains all of the required information and adheres to formatting
|
10
|
|
|
requirements including maximum prescribed length
|
|
|
|
|
• Written report uses professional language to clearly articulate meaning with
|
10
|
|
|
minimal typographic and grammatical errors
|
|
|
|
|
|
|
|
|
|
|
TOTAL
|
This will be converted to a final mark of 60
|
|
345
|
|