Due date: 19 August 2016, by 4pm
Deadline: 31 August 2016, by 4pm
This assignment consists of three questions. The firstquestion is worth 30 points, second and third
questions are each 35 points, so the entire assignment is worth 100 points. Your mark for this
assignment determines 5% of your final grade for this course.
This paper consists of one front page and three pages with questions. There are four pages in total.
Use Stata, and no other software, to perform the calculations for questions 2 and 3. The data sets
that you need can be found on Blackboard. In addition to your answers to the questions, also
include the relevant Stata commands and output, for example by copying and pasting.
Assignments must be submitted in hard copy (printed, legibly handwritten, or a combination of
both) via the drop boxes in the School of Economics foyer, which is located on the second floor
of the Merewether Building (H04). All submissions must include a completed, signed and dated
“Individual Assessment Cover Sheet”, which can also be found on Blackboard.
Assignments not submitted on or before the due date stated above are subject to penalty; refer
to sydney.edu.au/arts/current students/late work.shtml. That is, two points will be subtracted for
each working day or part thereof that has passed after the due date. Concretely, submissions
received after 4pm on 19 August but before 4pm on 31 August will be subject to a two-point
penaltyAfter the deadline, assessments cannot be accepted and a mark of 0 will be awarded.
Question 1 : (30 points)
The top graph below shows the distribution of the random variable X. The median of X is -5,
the mean of X is 5 , the mode of X is -10 and the variance of X is 25. On the lower graph, draw
the distribution for the sample mean of X (
) – (assume sample size “n” equals to 35) . Label
the mean and median of this distribution on the graph, including their numerical values if
possible. Also calculate the variance of
Question 2: (5×7=35 points)
The data set waiting.dta contains data on the amount of time that a random sample of callers
had to wait before their call was answered at a certain call center. All of the data were
obtained last week. For each call, the day of the week is also included.
(a) The manager of this call center claims that on average, callers wait for five minutes before
being answered. Show that this hypothesis cannot be rejected, at 5% significance.
(b) Despite the non-rejection in question 2.a., we still see that the sample mean is more than
Perform a one-sided test to see whether the population mean is more than five minutes.
(c) Statistically speaking, what is wrong with the procedure we followed in question 2.b.?
(d) A technological failure occurred last Wednesday, putting part of the call center offline for
most of the day. Therefore, we decide to exclude Wednesday’s data from consideration. Do
so, and test the manager’s claim again.
(e) If you were this manager’s supervisor, what would you tell him, based on the result you
found in question 2.d.?
Question 3: (5×7=35 points)
For this question, use the NSW census data available on Blackboard (nsw_earnings2010.dta).
These data are a 0.01% sample of the NSW population aged 25 to 65 from the year 2010.
Definition of Variables
“hours” gives the individual’s average number of hours worked per week in the previous
“earnings” gives the individual’s annual earnings ($) for the previous year.
(a) Summarize your dataset. What is the average earnings? And what is the median
earnings? By looking at these two measures, does the earnings distribution appear to be
symmetric, left-skewed or right-skewed?
Drop any observations for which average hours worked last year is zero. Find the mean,
median – which one has the largest change?
NOTE : USE these data (the one you dropped 0s )for the rest of the question
(b) Using the remaining observations, test the following hypothesis about the mean annual
earning using a significance level of 10%
In one sentence, describe what you are testing to a person who has no statistical training.
Clearly indicate each steps you take – and clearly state your conclusion.
(c) If you want to test whether the mean NSW annual earning is equal to $40,000, what
would your p-value be? Based on your answer what is the lowest significance level at
which you would reject the null hypothesis that the mean annual earning is equal to
(d) Calculate a 95% confidence interval for the population mean of earnings. In one
sentence, describe what this interval means to a person who has no statistical training.
(e) How would you expect your answers to the previous questions to change if you had a
0.1% sample of NSW residents rather than a 0.01% sample? Explain your answer