Biostatistics

This assignment will be marked out of 20. It is worth 20% of the total marks for Biostatistics B. It covers the material in modules S5 and S6. It is assumed that you have a clear understanding of the content in BIOS6910.

Question 1 – 8 marks

Artificial heart valves were tested in a mechanical apparatus which measured and controlled the pulse rate and the blood pressure of the heart. The purpose of the experiment was to determine the best valve type of the four studied. The maximum flow gradient (MFG) in mm Hg is the dependent variable of interest. The data are shown in the table below.

Table 1: Maximum flow gradient for four different types of heart valve

Valve type 1 Valve type 2 Valve type 3 Valve type 4

2

4

5

3

7

6

3

4

7

5

7

6 4

4

4

5

8

6

2

4

3

3

5

7 6

5

5

8

9

7

5

5

6

10

9

8 7

5

6

9

10

8

5

4

5

10

11

9

Source: Data obtained from Statlib. Data and Story Library

Input the data into Stata and produce parallel boxplots for MFG by valve type. Describe the graph and indicate if there are any differences in MFG between the four valve types. Justify your answer with appropriate summary statistics.

Conduct an appropriate statistical test (from those described in module S5) to determine if there are differences in MFG between valve types (use a 5% significance level). State the hypothesis you are testing, the test statistic from the test you use, your decision and your conclusion. If you find a statistically significant difference between valve types, do further analyses to determine what those differences are. As part of your answer, justify your choice of statistical test.

In your own words explain what the value of the f-statistic means in terms of variances.

Question 2 – 12 marks

THE CIGARETTE CONSUMPTION PANEL DATA SET

(see

The data set for this question consists of annual data for the 48 continental U.S. states in 1985 and 1995. A description of the variables is given in the table below. Quantity consumed is measured by annual per capita cigarette sales in packs per fiscal year (packpc), as derived from state tax collection data. The price is the average retail cigarette price per pack during the fiscal year, including taxes (avgprs). Income is per capita income (income). The general sales tax is the average tax, in cents per pack, due to the broad- based state sales tax applied to all consumption goods. The cigarette-specific tax is the tax applied to cigarettes only.

Variable name Description

state State of the USA

year Calender year

cpi Consumer Price Index (U.S.)

pop State population

packpc Number of packs per capita

income State personal income (total, nominal)

tax Average state, federal, and average local excise taxes for fiscal year

avgprs Average price during fiscal year, including sales taxes

taxs Average excise taxes for fiscal year, including sales taxes

For this question we are interested in the predictors of the consumption of cigarettes and the unit of analysis is the states of the US. We are only interested in the data for 1985 so only use the data for 1985 to answer the following questions.

a) Graph number of packs per capita against pop, income, avgprs and tax (NOT taxs). Based on what you see in the graphs only, which of these variables would you expect to be associated with consumption? Why?

[2 marks]

b) By fitting a series of simple linear regression models, determine which of the variables in part a are statistically significantly associated with consumption of cigarettes? Make sure to include all appropriate computer output.

[2 marks]

c) Fit a multiple linear regression model with all 4 predictor variables from part a included and interpret the coefficient of the aveprs variable. Make sure to include all appropriate computer output.

[2 marks]

d) Write down the formula for the relationship between consumption of cigarettes and the independent variables included in the model.

[1 mark]

e) What is the R2 value for the multiple regression model? Do you think this is large or small and can you suggest how it could be improved?

[1 marks]

f) Are the assumptions of linear regression appropriate for these data? Justify your answer.

[3 marks]

g) Do your conclusions about variables associated with cigarette consumption differ between the simple and multiple linear regressions? Describe any differences and explain why they might have occurred.

[1 marks]

ORDER THIS ESSAY HERE NOW AND GET A DISCOUNT !!!