7PAVITSM QA - Kings College London

Introduction

Purpose of the session: Q and A about the module and assessments.
Overview: Address pre-submitted questions and open the floor for participant inquiries.

Writing Up Multilevel Models

Consult The Reviewer's Guide to Quantitative Methods in the Social Sciences.
Key focus on writing multilevel models is on Hierarchical Linear Modeling.
Chapter 10 provides reporting guidelines; contains a checklist in Table 10.1.
Emphasis: The assessment does not require extensive detail on multilevel modeling.
Focus on fixed effects and their interpretation for the assessment.

Clarifications on Assessment Questions

Creating Tables

Section 3 allows for separate tables:
- One for missing data (questions a and b).
- Another for summary statistics (question c).
Recommendation: Choose a format that conveys information coherently.

Journal Guidelines for Tables

Formatting options: Any format is acceptable, as long as it’s consistent.
Citations should follow standard formats outlined in the program handbook.

Advanced Topics in Method Section

Advanced topics may include:
- Understanding of clustering issues and random slopes.
- Sophisticated discussion reflecting broader research awareness.
Example: Justifying inclusion/exclusion of random effects in models.

Writing Style Recommendations

Voice Usage

Prefer active voice (I did this) over passive voice (it was done).
Use of "we" in collaborative research is acceptable but less preferred.

Including Clustering in Directed Acyclic Graphs (DAGs)

Consider adding clustering variables to DAG as covariates (demonstrates advanced thinking).
Include explanations on relationships between variables in DAG.

Describing Analytical vs. Excluded Samples

Prefer descriptive comparisons without conducting t-tests in Section 3.
Summary statistics can reveal differences without p-values being a primary focus.

Overall Sample Table

Suggestion: A single table showing descriptive statistics for analytical and excluded samples is effective.
Emphasis on clarity between analytical, excluded, and combined sample statistics.

Handling Missing Data and DAGs

Theoretical framework of DAGs should remain intact, regardless of data quality.
Address limitations within analysis regarding missing variables (e.g., SES).
Example: SES might reveal unmeasured confounding; reflect on its absence as a limitation.

Reporting and Interpreting Results

Effect Estimates

Report effect estimates in a meaningful scale for clarity (e.g., expected changes in context).
Avoid arbitrary p-value thresholds; report exact p-values instead.

Table Formatting Best Practices

Tables should be self-explanatory and standalone (include captions and relevant descriptions).
Consider putting extensive or complex tables in supplementary materials.

Conclusion of Q&A Session

Reiterate open channels for questions or clarifications post-session.
Encourage efficient time management as deadlines approach, especially in January.

Instructions For this assignment, you must conduct and interpret an analysis using Stata or R. You will analyse synthetic data from a fictional study about air pollution and mental health in London, described below. Key points 1. Make sure to justify your approach and interpret your findings. Do not simply reproduce output from Stata or R. There are multiple ways of analysing the data and no single correct approach. 2. Your answers can combine text, tables and figures. You should choose the most appropriate way to present the findings. 3. Please ensure that the tables and figures are standalone. That means they include appropriate titles, labels, and (where needed) footnotes so readers can understand the figure without referring to the text. 4. Your tables should be formatted appropriately. Do not copy or screenshot tables directly from Stata or R. Instead, use Microsoft Word (or any other appropriate tool) to format your tables in a style suitable for publication. 5. You can upload a Microsoft Word Document (.docx) or a PDF. You can prepare your assignment using Quarto, but you must still submit your work as a Word document (.docx) or a PDF. 6. In addition to submitting your Word document or PDF, you must submit scripts that contain all the commands used in your assignment, including those for data cleaning (see “Section 7”, below). Your scripts should be appropriately annotated with comments. Failure to submit a script will result in a mark deduction. You can submit multiple scripts. To upload your script, create a Zip file containing your Stata do-file or R script and upload it via KEATS. 2 Formatting 1. Your submission must include the ASMHI Coursework Coversheet. 2. Your submission must be identified by your candidate number only. Do not include any other identifying information (such as your name, K-number, or email address) anywhere in the submission. 3. You must adhere to the formatting guidelines detailed in the ASMHI Programme Handbook. 4. The word limit is 2,600. Please refer to the Programme Handbook for details on penalties for exceeding or falling short of this limit, as well as what text is included in the word count. Dataset We have provided you with a synthetic dataset from a fictional study of psychiatric and physical morbidity study in London1. The study was a cross-sectional survey of 1,834 adults aged 18 years and over residing in the 32 London boroughs between 2008 and 2010. The total population size for the 32 boroughs was 7,653,600 (mid-2009 estimate). The dataset contains the following variables: pid Participant numeric identifier borough Borough age Participant age at time of completing the survey ses Socioeconomic measure of deprivation 1 = Least deprived; 5 = Most deprived. employment Nature of work (Manual/Office) gender Gender (Female/Male) exercise How many times a week they exercise smoker Smoking status (Smoker/Non-smoker) pm25 Particulate matter with an aerodynamic diameter of <2.5 mm, annual average concentrations (mg/m3 ). phq9 Depression symptom score (0 = No symptoms; 27 = All symptoms). 1 Our fictional example is based on a real-life study, the South East London Community Health Study (SELCoH; see Hatch et al. 2016, 10.1186/1471-2458-11-861). 3 For participants in one of the boroughs (Greenwich, n=85), there is additional information on clinical diagnosis of major depressive disorder (MDD): mdd Has received a diagnosis of major depressive disorder (0 = No; 1 = Yes) Any categorical values not listed above should be treated as missing values (e.g., “NA”). This assignment is divided into eight sections. You should structure your report into these sections using the same headings. Section 1: Background [4 marks; ≈ 300 words] Write two paragraphs that provide the context for your later analyses. Your first paragraph should briefly summarise the literature linking air pollution and depression. Your second paragraph should describe the research questions addressed in subsequent analyses. You should include a small number of relevant citations. Section 2: Methods [14 marks; ≈ 400 words] Describe and justify the methods used in subsequent questions. You should write about each section in turn, for example: This analysis is conducted in four sections. In Section 3, I used […] to describe the characteristics of the analytical sample. In Section 4, I used […] to […]. In Section 5, I used […] to […]. You should describe the steps to prepare the data (i.e., data cleaning, such as handling of outliners) and the approach for each analysis section. Section 3: Descriptive statistics [10 marks; ≈ 400 words] Describe the sample in writing and with a table (“Table 1”). Your answer should: a) Describe the extent of missing data in the sample. b) Describe and compare the analytical sample and the excluded sample. Subsequent analyses will be “complete case” (see Section 4), meaning you analyse participants with complete information on the outcome, exposures, 4 and covariates. Participants with missing information will be omitted using listwise deletion. Therefore, you describe and compare: Analytical sample: participants with complete data across all required variables; Excluded sample: participants with any missing information. c) Include a table that uses appropriate summary statistics to describe relevant characteristics for the overall sample. Your write-up should summarise the key points from the table but does not need to repeat every statistic. Section 4: Linear regression [30 marks; ≈ 500 words] For this section, you must specify, estimate, and interpret a regression model in Stata or R. Your analysis aims to answer the research question, “What is the effect of air pollution on depressive symptoms?”. Outcome phq9, a score measuring symptoms of depression. Exposure pm25, a measure of air pollution (particulate matter with an aerodynamic diameter of <2.5 mm; annual average concentrations, mg/m3). You must: a) Specify a directed acyclic graph (DAG) for your model. This graph should include all relevant variables from the provided dataset. You should briefly explain your rationale for the specification of your DAG. You may want to consider unobserved variables in your DAG. However, you needn’t address them in your analysis. b) Use an appropriate regression model to estimate the effect of air pollution (pm25) on depression symptom severity (phq9). Report the estimate and appropriate inferential statistics (e.g., confidence intervals). Interpret your model in relation to the above research question. What evidence have you found? c) Assess and report the statistical assumptions of your model. Use appropriate plots. Make sure you describe and interpret each plot. Points to consider: • Use Section 2 (“Methods”) to explain and justify your approach. 5 • You should ‘adjust’ your analysis for the minimum adjustment implied by your chosen DAG. You can use tools like Daggity for this. • Your analyses should be “complete case”. That means using listwise deletion to include participants with complete information on outcome, exposure and selected covariates. We do not expect you to use missing data methods, such as multiple imputation. • The survey was conducted in 32 London boroughs. Past evidence has highlighted wide variation in mean levels of depressive symptoms by borough. You should account for this in your analysis. • As in Section 3, you should report your analysis in a style suitable for publication in an academic journal. That includes using appropriate labels and captions for any tables or figures. Section 5: Testing for moderation [15 marks; ≈ 400 words] Some past studies have suggested that the effect of air pollution on mental health depends on smoking. Specifically, there is evidence that the association of air pollution with depressive symptoms is stronger among current smokers compared to non-smokers. By adapting your model from Section 4, test this hypothesis and interpret your findings. You must: • Test the hypothesis using an appropriate inferential statistic (e.g., a p-value). • Report separate effect estimates for smokers and non-smokers. Points to consider: • Use Section 2 (“Methods”) to explain and justify your approach. • You should fully interpret your findings, using a plot if helpful. Section 6: A binary model for diagnosis [15 marks; ≈ 300 words] Medical information on a diagnosis of major depressive disorder (MDD) was only available for participants in one borough, Greenwich. Using the DAG you created earlier, specify an appropriate model to estimate the effect of air pollution on MDD diagnosis. Outcome mdd, a binary indicator of MDD diagnosis (0 = No; 1 = Yes). Exposure pm25, a measure of air pollution (particulate matter with an aerodynamic diameter of <2.5 mm; annual average concentrations, 6 mg/m3). You must: • Report effect estimates on a meaningful scale, together with a clear interpretation of the estimate. • Report the effect estimates using appropriate inferential statistics (e.g., confidence intervals). Points to consider: • Your model should only include participants from Greenwich. • You should ‘adjust’ your analysis for the minimum adjustment implied by the DAG you developed in Section 4, above. • You should consider the practical significance (as well as the statistical significance) of your findings. Section 7: Strengths and limitations [4 marks; ≈ 300 words] Write two paragraphs describing the strengths and limitations of your analysis, respectively. You should consider any assumptions regarding the dataset and your analysis and interpretation. Section 8: Analysis code [8 marks] Upload all scripts used in your analysis (i.e., data cleaning and analysis) as a separate attachment with your submission. • Your script must include all data cleaning and analysis steps used in your answers to previous sections. • Your script must include a header that includes your candidate number only. Do not include other identifying information (e.g., your name or K-number) and ensure that file paths do not contain your username. • Your script should be clear and well-organised and follow good programming standards, such as appropriately using comments.