BIOSTATISTICS (BIOS)
Additional Resources
Courses
Academic credit for approved internship experience.
Directed readings or laboratory study. May be taken more than once. Two to six laboratory hours a week.
Access to SAS, Excel required. Permission of instructor for nonmajors. Introductory course in probability, data analysis, and statistical inference designed for B.S.P.H. biostatistics students. Topics include sampling, descriptive statistics, probability, confidence intervals, tests of hypotheses, chi-square distribution, 2-way tables, power, sample size, ANOVA, non-parametric tests, correlation, regression, survival analysis.
Required preparation, previous or concurrent course in applied statistics. Permission of instructor for nonmajors. Introduction to use of computers to process and analyze data, concepts and techniques of research data management, and use of statistical programming packages and interpretation. Focus is on use of SAS for data management and reporting.
Students will gain proficiency with R, data wrangling, data quality control and cleaning, data visualization, exploratory data analysis, with an overall emphasis on the principles of good data science, particularly reproducible research. The course will also develop familiarity with several software tools for data science best practices, such as Git, Docker, Jupyter, Make, and Nextflow.
Arrangements to be made with the faculty in each case. A course for students of public health who wish to make a study of some special problem in the statistics of the life sciences and public health. Honors version available.
Required preparation, knowledge of basic descriptive statistics. Major topics include elementary probability theory, probability distributions, estimation, tests of hypotheses, chi-squared procedures, regression, and correlation.
Topics will include gaining proficiency with R and Python, data wrangling, data quality control and cleaning, data visualization, exploratory data analysis, and introductory applied optimization, with an overall emphasis on the principles of good data science, particularly reproducible research. Some emphasis will be given to large data settings such as genomics or claims data. The course will also develop familiarity with software tools for data science best practices, such as Git, Docker, Jupyter, and Nextflow.
This course will be an introductory course to machine learning. The goal is to equip students with knowledge of existing tools for data analysis and to prepare students for more advanced courses in machine learning. Students in the SPH Master of Public Health with a Public Health Data Science concentration receive priority for enrollment.
Course is designed to meet the needs of health care professionals to appraise the design and analysis of medical and health care studies and who intend to pursue academic research careers. Covers basics of statistical inference, analysis of variance, multiple regression, categorical data analysis. Previously offered as PUBH 741. Permission of instructor.
Continuation of BIOS 641. Main emphasis is on logistic regression; other topics include exploratory data analysis and survival analysis. Previously offered as PUBH 742.
Required preparation, basic familiarity with statistical software (preferably SAS able to do multiple linear regression) and introductory biostatistics, such as BIOS 600. Continuation of BIOS 600. Analysis of experimental and observational data, including multiple regression and analysis of variance and covariance. Previously offered as BIOS 545. Permission of the instructor for nonmajors.
Required preparation, two semesters of calculus (such as MATH 231, 232). Fundamentals of probability; discrete and continuous distributions; functions of random variables; descriptive statistics; fundamentals of statistical inference, including estimation and hypothesis testing.
Required preparation, three semesters of calculus (such as MATH 231, 232, 233). Introduction to probability; discrete and continuous random variables; expectation theory; bivariate and multivariate distribution theory; regression and correlation; linear functions of random variables; theory of sampling; introduction to estimation and hypothesis testing.
Distribution of functions of random variables; Helmert transformation theory; central limit theorem and other asymptotic theory; estimation theory; maximum likelihood methods; hypothesis testing; power; Neyman-Pearson Theorem, likelihood ratio, score, and Wald tests; noncentral distributions.
Principles of study design, descriptive statistics, sampling from finite and infinite populations, inferences about location and scale. Both distribution-free and parametric approaches are considered. Gaussian, binomial, and Poisson models, one-way and two-way contingency tables.
Required preparation, BIOS 662. Matrix-based treatment of regression, one-way and two-way ANOVA, and ANCOVA, emphasizing the general linear model and hypothesis, as well as diagnostics and model building. Reviews matrix algebra. Includes statistical power for linear models and binary response regression methods.
Fundamental principles and methods of sampling populations, with emphasis on simple, random, stratified, and cluster sampling. Sample weights, nonsampling error, and analysis of data from complex designs are covered. Practical experience through participation in the design, execution, and analysis of a sampling project.
Introduction to the analysis of categorized data: rates, ratios, and proportions; relative risk and odds ratio; Cochran-Mantel-Haenszel procedure; survivorship and life table methods; linear models for categorical data. Applications in demography, epidemiology, and medicine.
Matrix-based longitudinal data analysis emphasizing applications and interpretation. Linear and generalized linear, marginal and mixed regression models. Fixed effects and random effects. Maximum likelihood, REML, GEE. Regression diagnostics. Sample size. Simulation of longitudinal data.
Statistical concepts in basic public health study designs: cross-sectional, case-control, prospective, and experimental (including clinical trials). Validity, measurement of response, sample size determination, matching and random allocation methods.
Provides a foundation and training for working with data from clinical trials or research studies. Topics: issues in study design, collecting quality data, using SAS and SQL to transform data, typical reports, data closure and export, and working with big data.
Source and interpretation of demographic data; rates and ratios, standardization, complete and abridged life tables; estimation and projection of fertility, mortality, migration, and population composition.
Selected topics in calculus, real analysis including Taylor's series, Riemann, Stieltjes and Lebesgue integration, and complex variables. Introduction to measure theory.
This course introduces intermediate concepts and theories in statistical inferences, including multivariate transformation, convergence of random vectors, sufficient and complete statistics, methods of estimation, and advanced problems such as information inequality, unbiased estimators, Bayes estimators, asymptotically efficient estimation, nonparametric estimation, and simultaneous confidence intervals.
Introduction to concepts and techniques used in the analysis of time to event data, including censoring, hazard rates, estimation of survival curves, regression techniques, applications to clinical trials.
Field/topical/research seminar. Instructors use this course to offer instruction in particular topics or approaches.
Field visits to, and evaluation of, major nonacademic biostatistical programs in the Research Triangle area. Field fee: $25.
Directed research. Written and oral reports required.
Directed research. Written and oral reports required.
Permission of the department for students with passing grade of either doctoral qualifying examination in biostatistics. BIOS 700 will introduce doctoral students in biostatistics to research skills necessary for writing a dissertation and for a career in research.
Required preparation, one undergraduate-level programming class. Teaches important concepts and skills for statistical software development using case studies. After this course, students will have an understanding of the process of statistical software development, knowledge of existing resources for software development, and the ability to produce reliable and efficient statistical software.
Permission of the instructor. Statistical theory applied to special problem areas of timely importance in the life sciences and public health. Lectures, seminars, and/or laboratory work, according to the nature of the special area under study.
This course will introduce the methods used in clinical. Topics include dose-finding trials, allocation to treatments in randomized trials, sample size calculation, interim monitoring, and non-inferiority trials.
Theory and application of nonparametric methods for various problems in statistical analysis. Includes procedures based on randomization, ranks and U-statistics. A knowledge of elementary computer programming is assumed.
Measure space, sigma-field, measurable functions, integration, conditional probability, distribution functions, characteristic functions, convergence modes, SLLN, CLT, Cramer-Wold device, delta method, U-statistics, martingale central limit theorem, UMVUE, estimating function, MLE, Cramer-Rao lower bound, information bounds, LeCam's lemmas, consistency, efficiency, EM algorithm.
Elementary decision theory: admissibility, minimaxity, loss functions, Bayesian approaches. Hypothesis testing: Neyman-Pearson theory, UMP and unbiased tests, invariance, confidence sets, contiguous alternatives. Elements of stochastic processes: Poisson processes, renewal theory, Markov chains, martingales, Brownian motion.
Linear algebra, matrix decompositions, estimability, multivariate normal distributions, quadratic forms, Gauss-Markov theorem, hypothesis testing, experimental design, general likelihood theory and asymptotics, delta method, exponential families, generalized linear models for continuous and discrete data, categorical data, nuisance parameters, over-dispersion, multivariate linear model, generalized estimating equations, and regression diagnostics.
Continuation of BIOS 664 for advanced students: stratification, special designs, multistage sampling, cost studies, nonsampling errors, complex survey designs, employing auxiliary information, and other miscellaneous topics.
Theory and application of methods for categorical data including maximum likelihood, estimating equations and chi-square methods for large samples, and exact inference for small samples.
Presents modern approaches to the analysis of longitudinal data. Topics include linear mixed effects models, generalized linear models for correlated data (including generalized estimating equations), computational issues and methods for fitting models, and dropout or other missing data.
Required preparation, integral calculus. Life table techniques; methods of analysis when data are deficient; population projection methods; interrelations among demographic variables; migration analysis; uses of population models.
The course will review major statistical methods for the analysis of MRI and its applications in various studies.
Fundamental concepts, including classifications of missing data, missing covariate and/or response data in linear models, generalized linear models, longitudinal data models, and survival models. Maximum likelihood methods, multiple imputation, fully Bayesian methods, and weighted estimating equations. Focus on biomedical sciences case studies. Software packages include WinBUGS, SAS, and R.
This advanced machine learning course, designed for PhD students in biostatistics and related fields, centers on cutting-edge tools in ML, encompassing theory, methods, and applications. It is motivated by complex biomedical data problems, offering in-depth exploration of technical details, model understanding, and the strengths and weaknesses of various approaches. The aim is to provide a comprehensive understanding of state-of-the-art ML tools for effectively analyzing and solving intricate biomedical data challenges.
Statistical concepts and techniques for evaluating medical diagnostic tests and biomarkers for detecting disease. Measures for quantifying test accuracy. Statistical procedures for estimating and comparing these quantities, including regression modeling. Real data will be used to illustrate the methods. Developments in recent literature will be covered.
This course will consider drawing inference about causal effects in a variety of settings using the potential outcomes framework. Topics covered include causal inference in randomized experiments and observational studies, bounds and sensitivity analysis, propensity scores, graphical models, and other areas.
In this course, we will address precision medicine from a statistical and machine learning perspective with numerous examples of application. We will develop a working knowledge of the following inter-related areas in the context of precision medicine and precision health: dynamic treatment regimes; causal inference for precision medicine; study designs such as SMARTs, basic and advanced machine learning and artificial intelligence tools, including deep learning, outcome weighted learning, reinforcement learning and Markov decision processes.
Topics include Bayes' theorem, the likelihood principle, prior distributions, posterior distributions, predictive distributions, Bayesian modeling, informative prior elicitation, model comparisons, Bayesian diagnostic methods, variable subset selection, and model uncertainty. Markov chain Monte Carlo methods for computation are discussed in detail.
Counting process-martingale theory, Kaplan-Meier estimator, weighted log-rank statistics, Cox proportional hazards model, nonproportional hazards models, multivariate failure time data.
An introduction to statistical procedures in human genetics, Hardy-Weinberg equilibrium, linkage analysis (including use of genetic software packages), linkage disequilibrium and allelic association.
This course provides a comprehensive survey of the statistical methods for the designs and analysis of genetic association studies, including genome-wide association studies and next-generation sequencing studies. The students will learn the theoretical justifications for the methods as well as the skills to apply them to real studies.
Molecular biology, sequence alignment, sequence motifs identification by Monte Carlo Bayesian approaches, dynamic programming, hidden Markov models, computational algorithms, statistical software, high-throughput sequencing data and its application in computational biology.
Clustering algorithms, classification techniques, statistical techniques for analyzing multivariate data, analysis of high dimensional data, parametric and semiparametric models for DNA microarray data, measurement error models, Bayesian methods, statistical software, sample size determination in microarray studies, applications to cancer.
Theory and applications of empirical process methods to semiparametric estimation and inference for statistical models with both finite and infinite dimensional parameters. Topics include bootstrap, Z-estimators, M-estimators, semiparametric efficiency.
An introduction to the statistical collaborative process and leadership skills. Emphasized topics include problem solving, study design, data analysis, ethical conduct, teamwork, career paths, data management, written and oral communication with scientists and collaborators.
Under supervision of a faculty member, the student interacts with research workers in the health sciences, learning to abstract the statistical aspects of substantive problems, to provide appropriate technical assistance, and to communicate statistical results.
This seminar course is intended to give students exposure to cutting edge research topics and hopefully help them in their choice of a thesis topic. It also allows the student to meet and learn from major researchers in the field.
Using lectures and group exercises, students are taught where and how biostatisticians can offer leadership in both academic and nonacademic public health settings.
Required preparation, a minimum of one year of graduate work in statistics. Principles of statistical pedagogy. Students assist with teaching elementary statistics to students in the health sciences. Students work under the supervision of the faculty, with whom they have regular discussions of methods, content, and evaluation of performance.
Permission of the instructor. Seminar on new research developments in selected biostatistical topics.
Individual arrangements may be made by the advanced student to spend part or all of his or her time in supervised investigation of selected problems in statistics.