Assignment 1

Due date: 19 August 2016, by 4pm

Deadline: 31 August 2016, by 4pm

Instructions:

This assignment consists of three questions. The firstquestion is worth 30 points, second and third

questions are each 35 points, so the entire assignment is worth 100 points. Your mark for this

assignment determines 5% of your final grade for this course.

This paper consists of one front page and three pages with questions. There are four pages in total.

Use Stata, and no other software, to perform the calculations for questions 2 and 3. The data sets

that you need can be found on Blackboard. In addition to your answers to the questions, also

include the relevant Stata commands and output, for example by copying and pasting.

Assignments must be submitted in hard copy (printed, legibly handwritten, or a combination of

both) via the drop boxes in the School of Economics foyer, which is located on the second floor

of the Merewether Building (H04). All submissions must include a completed, signed and dated

“Individual Assessment Cover Sheet”, which can also be found on Blackboard.

Assignments not submitted on or before the due date stated above are subject to penalty; refer

to sydney.edu.au/arts/current students/late work.shtml. That is, two points will be subtracted for

each working day or part thereof that has passed after the due date. Concretely, submissions

received after 4pm on 19 August but before 4pm on 31 August will be subject to a two-point

penaltyAfter the deadline, assessments cannot be accepted and a mark of 0 will be awarded.

Good luck!

Question 1 : (30 points)

The top graph below shows the distribution of the random variable X. The median of X is -5,

the mean of X is 5 , the mode of X is -10 and the variance of X is 25. On the lower graph, draw

the distribution for the sample mean of X (

) – (assume sample size “n” equals to 35) . Label

the mean and median of this distribution on the graph, including their numerical values if

possible. Also calculate the variance of

.

Question 2: (5×7=35 points)

The data set waiting.dta contains data on the amount of time that a random sample of callers

had to wait before their call was answered at a certain call center. All of the data were

obtained last week. For each call, the day of the week is also included.

(a) The manager of this call center claims that on average, callers wait for five minutes before

being answered. Show that this hypothesis cannot be rejected, at 5% significance.

(b) Despite the non-rejection in question 2.a., we still see that the sample mean is more than

five minutes.

Perform a one-sided test to see whether the population mean is more than five minutes.

(c) Statistically speaking, what is wrong with the procedure we followed in question 2.b.?

(d) A technological failure occurred last Wednesday, putting part of the call center offline for

most of the day. Therefore, we decide to exclude Wednesday’s data from consideration. Do

so, and test the manager’s claim again.

(e) If you were this manager’s supervisor, what would you tell him, based on the result you

found in question 2.d.?

Question 3: (5×7=35 points)

For this question, use the NSW census data available on Blackboard (nsw_earnings2010.dta).

These data are a 0.01% sample of the NSW population aged 25 to 65 from the year 2010.

Definition of Variables

“hours” gives the individual’s average number of hours worked per week in the previous

year.

“earnings” gives the individual’s annual earnings ($) for the previous year.

(a) Summarize your dataset. What is the average earnings? And what is the median

earnings? By looking at these two measures, does the earnings distribution appear to be

symmetric, left-skewed or right-skewed?

Drop any observations for which average hours worked last year is zero. Find the mean,

median – which one has the largest change?

NOTE : USE these data (the one you dropped 0s )for the rest of the question

(b) Using the remaining observations, test the following hypothesis about the mean annual

earning using a significance level of 10%

: $60,000

: $60,000

In one sentence, describe what you are testing to a person who has no statistical training.

Clearly indicate each steps you take – and clearly state your conclusion.

(c) If you want to test whether the mean NSW annual earning is equal to $40,000, what

would your p-value be? Based on your answer what is the lowest significance level at

which you would reject the null hypothesis that the mean annual earning is equal to

$40,000?

(d) Calculate a 95% confidence interval for the population mean of earnings. In one

sentence, describe what this interval means to a person who has no statistical training.

(e) How would you expect your answers to the previous questions to change if you had a

0.1% sample of NSW residents rather than a 0.01% sample? Explain your answer