Final Paper

The final paper should be a well-crafted quantitative analysis of a an original research question of 8 to 12 pages.

It should employ the relevant, but not necessarily all, of the techniques covered in this course—data visualization, descriptive statistics, exploratory data analysis, statistical tests, linear regression—to answer the research question that was posed.

While the student is encouraged to find interesting and original research questions that could eventually lead larger projects, the final paper will be evaluated on how well it is crafted and the appropriate use of methodologies to answer the research question as it was posed.

All analysis presented must be reproducible and the manuscript must be formatted according the guidelines.

Reproducibility policy

All data and code necessary to reproduce the analyses presented in the paper must be submitted along with the paper. The data and analyses should be well documented and understandable to a reader.

The analysis code must be clean, parsimonious and well-organized, that is, you should submit only the necessary code needed to reproduce what you present in the paper. You are encouraged to use literate programming tools such as R Markdown and knitr.

Main Text

Your final paper should be between 8 - 12 pages all text (the abstract is text), tables and figures, notes, and references included in the manuscript. Font size must be a standard 12 point serif font. The manuscript must be double spaced for all body text including the references, but text in tables and figures may be single spaced. Page margins must be one inch.

The submission must include a title page with the title of the paper, the author’s name, date, an abstract of no more than 150 words. The abstract should concisely state the literature to which the paper contributes, the specific topic it addresses, the methodology employed for the analysis, the results of the analysis, and the implications of the findings.

The manuscript should present its content as efficiently as possible. You do not need to include a literature review; state the research puzzle, your theory, and how it compares to previous approaches. Cite the directly relevant works where necessary to support your argument.

Numbers, Tables, and Figures

Tables and figures should be included where they fall in the text, and not in an appendix. Tables and figures should be formatted for publication. In particular do not use the direct simple output of R. And, of course do not take screen-shots of the output. Rather, use packages like starazer, pander, kable, xtable. At the minimum if you end up copy-pasting the simple output make sure to clean it up and format it on the paper. The tables should be clear, readable, and easily understandable.

Do not use acronyms or computational abbreviations when discussing variables in the text. All variables that appear in tables or figures should be described in appropriate detail in the text. That is, do not use variable names, which only you will understand. For example, instead of using V23TH which was used to describe gender in the original data-set use the name Female which would be understandable even if the reader has never looked at the code-book.

You are encouraged to use graphics to illustrate your main argument and research findings. If color is used, then it should be used appropriately, following the best practices in data visualization. Make sure to label everything appropriately. Do not leave variables names or part of the coding. For example, legends that read factor(V95M61) should be coded properly beforehand so that the legend in the graph is simply Level of Eduction

Numbers used in the manuscript text, figures, and tables should be reported with no more precision than they are measured with or is substantively meaningful. A good rule of thumb is to use no more than 2 digits unless there is good reason to do so. Do not report more significant digits than the standard errors imply. Variables should be scaled to make the reporting of results as straightforward as possible.

The uncertainty of estimates is best conveyed by standard errors or confidence intervals. Generally, tables should not routinely report t-statistics or p-values for tests of the null hypothesis that each coefficient is zero Regression tables should report the estimate and standard errors of the coefficients, without stars indicating significance levels. While discouraged, it is acceptable to use asterisks, “stars”, or other symbols to represent varying levels of statistical significance.

Footnotes

Explanatory footnotes may be included but should not be used for simple citations. Simple citations should be made in-line in author-year format. Do not use end notes.

Citations and References

References should be listed in a separate section titled “References.” Citations should be in the author-year format, and references can follow your formats of preference, Chicago, MLA, APA, are some common styles used in the science sciences.

Data citation

All data used in the paper must be cited. Citations must include enough information for readers to find the original sources, and for those original sources to be consistently identified in the future. Data citations should appear in the the manuscript’s reference list.