DATA SCIENCE AND SOCIETY (DATA)
Additional Resources
Any courses approved after June 1, 2026 will not appear in the 2026-27 Academic Catalog but will be available in ConnectCarolina.
Courses
This course is a broad, high-level survey of the major aspects of data science including ethics, best practices in communication (e.g. data visualization), mathematical/statistical concepts, and computational thinking. Students will gain an understanding of the fundamentals of data science to support more in-depth, advanced coursework that are requirements for the data science majors. Honors version available.
This interdisciplinary course introduces the fundamentals of Artificial Intelligence (AI) to students of all majors. Through lectures, hands-on activities, and discussions, students will explore what AI is, how it works, and its growing impact on society. Emphasis is placed on understanding both the power and the limitations of modern AI, including ethical and societal considerations. By the end of the semester, students from any discipline will be able to critically understand and responsibly apply AI concepts in their field of interest.
In an era of rapid advancements in data science and AI, ethical concerns related to data-intensive technologies are now of utmost importance. This course immerses students in data science ethics, facilitating a comprehensive exploration of the intricate interplay between data and societal values. By nurturing critical thinking grounded in ethical theories, this course provides students with a strong foundation in designing and analyzing data-intensive ecosystems that emphasize values such as fairness, accountability, ethics, and transparency. Honors version available.
How do you become data literate? Data literacy is the ability to read, write, and communicate data in context, or in other words: perform data analysis, construct a data visualization, and then communicate that data. It is the story that gets told with the data. Data literacy helps us to understand data, learn about different types and scales of data, and understand why this is important in the world today.
Data structures provide a means to manage large amounts of data for use in our databases and indexing services. A data structure is a specialized format for organizing, processing, retrieving and storing data. There are several basic and advanced types of data structures, all designed to arrange data to suit a specific purpose. Data structures make it easy for users to access and work with the data they need in appropriate ways.
The ability to collect and analyze data has changed virtually every field, yet data scientists often lack the ability to present their findings in effective formats. This class uses storytelling to help you connect with your audience and present your data in compelling and understandable ways so stakeholders can make the right decisions with data. Through hands-on exercises, you'll learn the advantages and disadvantages of oral, visual, and written formats.
This course offers a chance to learn about some of the most foundational computational tools for data scientists: from the common shell commands to interact with local and remote systems, to the profilers that help inspect and optimize performant Python code. Too often, data scientists are expected to learn many of these computational nuts and bolts on their own, creating unexpected barriers to entry, shibboleths, and missed opportunities that these simple yet powerful tools can help unlock. Students will be able to avoid many of these pitfalls and extract even more value from courses that involve more specialized tools.
Instructor approval required. ULAs must have completed the course for which they are serving with a grade of B or higher. Selection of ULAs is at the discretion of faculty teaching the course for which the ULA is serving. During the semester they serve as ULAs, this course will provide support and structure to make them effective in their role, including training in pedagogy and university policies; ongoing mentorship and supervision; opportunities to reflect; and assessment and evaluation.
This course introduces undergraduates from both STEM and non-STEM backgrounds to practical artificial intelligence (AI) tools for writing, research, data analysis, creative media, assisted coding (building applications through natural language and no-code platforms), among others. Students will gain hands-on experience with accessible and modern AI tools and solutions, as well as will learn best practices for prompt engineering and experiment with strategies for effective communication with AI systems.
This course focuses an approach grounded in understanding the core principles and practices of AI while engaging with its implications for humanity. Key concepts covered will include reliability, safety, transparency, accountability, and trustworthiness of AI systems as well as mitigating the potential risks and harms that arise when AI technology is deployed and used by humans. By the end of the course, students should be able to articulate how AI systems can be designed to respect and prioritize human values and needs.
An ideal internship provides students with practical experience in an organization outside of UNC, doing work that is relevant to their UNC education. The internship should develop and enhance the students' professional skill sets and involve experiences that allow students to have responsibility for results that are of value to the organization. A signed learning contract is required prior. Data science B.S. majors only. Permission of instructor and director of undergraduate studies required.
Student will be working with a faculty member to assist in research. The students with the faculty advisor would work in the faculty's research area, understand the faculty's research, and, if needed, help evaluate results.
This course provides a holistic, interdisciplinary research toolkit for understanding the contact point between data and society. It will explore science and technology studies (STS), AI ethics, and social theory, while learning from legal scholars, historians, and computer scientists. It examines how different approaches involve distinct normative assumptions, blind spots, and even value judgments about technology and society. The course is focused on developing skills for finding, reading, and synthesizing research across disciplines, and for effectively translating knowledge between contexts - especially between technical and social perspectives on datafication, algorithms, and AI. No technical background is required.
This class is a combination of mathematical theory and practical numerical implementation and experimentation within the context of data science. Students are expected to be familiar with programming concepts, and the class will focus on numerical computing using those concepts. Programming languages are up to the instructor, but expect a hybrid approach with a post-processing part that involves visualization/graphics and a lower-level approach in languages such as C++, Julia, Python, etc., depending on the instructor. Mixing languages is encouraged. While COMP 110 or COMP 116 are required as prerequisites, programming proficiency beyond these courses is preferred.
An advanced undergraduate course in the mathematical and statistical principles for data science. It aims to equip students with a rigorous theoretical foundation that is consolidated through hands-on lab studies. The emphasis is on critical thinking and problem-solving skills that enable students to utilize the theoretical tools in future research or practical work in statistics, actuarial science, machine learning, analytics, applied mathematics, and other areas.
Understanding system behavior is key to designing effective solutions, especially in data science and other disciplines with inherent complexity. DATA 442 introduces systems thinking and analysis methods and prepares students with effective methods and modeling techniques for system design. Students learn about the structures of systems, analyze system dynamics, and discover how to use feedback mechanisms for system control. Knowledge of introductory physics is preferred, often fulfilled with AP Physics, Physics 114 or Physics 118. If you have questions about the prerequisites or the sufficiency of your knowledge of physics, please reach out to the instructor.
This course is designed to give undergraduate students hands-on data science experience using real-world research requests. Students will learn and use the Data Science Project Cycle framework to complete a project from an external client. Clients from an array of industries, as well as those from within UNC, will submit project requests that are feasible within a 12-week period. The course will offer an authentic learning experience to develop students' research skills and professional attributes, such as teamwork, communication, and project management, in preparation for workforce entry.
An ideal internship provides students with practical experience in an organization outside of UNC, doing work that is relevant to their UNC education. The internship should develop and enhance the students' professional skill sets and involve experiences that allow students to have responsibility for results that are of value to the organization. A signed learning contract is required prior. Data science B.S. majors only. Permission of instructor and director of undergraduate studies required.
For data science majors only and by permission of the instructor. Independent research to be conducted under the direct mentorship of a data science faculty member. If repeated, the repeated course can not be counted for the major. Students would be working with a faculty member to conduct research. Students along with the faculty advisor would work on a research idea, understand how to gather evidence, evaluate results, deliver the results in some fashion, and then use that to think of future impacts or other works. Majors only.
By permission of the director of undergraduate studies. A deeper investigation under the supervision of a faculty member of topics in data science that may be, but need not be, connected with an existing course.
Recommended preparation, BIOL 101. This course integrates statistical and bioinformatics methodologies with data science techniques to analyze biological data. Students will learn statistical inference, predictive modeling, and machine learning approaches applied to biological datasets such as genomic, epigenomics, sequencing, and population genetics data. The course emphasizes hands-on experience with R, Python, and bioinformatics databases, preparing students to interpret complex biological data and make data-driven decisions in health informatics and biomedical research.
This research-focused course immerses students in socially and ethically responsible Artificial Intelligence. Emphasizing hands-on experience, the course guides students through the intricacies of conducting ethically responsible AI research with a keen focus on AI fairness, societal impacts, and real-world applications. Students collaborate in teams, selecting their preferred research area and societal problem from broader data science themes, and pursue a semester-long data science project under the guidance of the faculty and advanced graduate student teaching assistants. The course curriculum spans crucial topics including exploration of emerging trends in AI and data science.
This course provides a comprehensive introduction to the foundations of artificial intelligence. Students will explore a range of topics including search algorithms, constraint satisfaction, and optimization problems, as well as logic and reasoning. The course will introduce probabilistic reasoning, decision theory, and Markov decision processes as frameworks for decision-making under uncertainty. In addition, students will learn the fundamentals of machine learning and examine key issues in trustworthy AI, focusing on fairness, interpretability, and ethical considerations. This course emphasizes both theoretical understanding and practical applications, preparing students to analyze and design AI systems in a variety of domains.
This course focuses on the practical implementation of deep learning systems, providing students with hands-on experience in designing, building, and optimizing machine learning (ML) and deep learning applications. Students will learn the foundational concepts of machine learning systems, gaining insights into the end-to-end process of developing a machine learning application. The course will cover essential techniques and tools for implementing modern deep learning algorithms across key areas such as computer vision, natural language processing, graph analysis, reinforcement learning, and generative models.
This course explores cutting-edge machine learning methods (ML), their applications across various domains, and foundational methods in modeling and data mining for artificial intelligence (AI). Topics include transfer learning, representation learning, graph mining, fundamentals of generative models, and uncertainty estimation. The course also presents case studies, hands-on coding assignments, and a final group project allowing students to implement scalable algorithms and address real-world challenges in AI. Students will gain a solid understanding of the underlying concepts, algorithms, and applications of AI, and practical experience in developing and implementing AI systems using modern data mining methods and modeling approaches.
Introduces the motivations, objectives, and principles of financial risk management through the lens of insurance, reinsurance and financial institutions. Students will become familiar with key concepts that shape these industries so they can effectively communicate using industry vocabulary, metrics, and tools. Standards governing financial risk management are introduced as are the different types of risks that financial institutions, insurers and reinsurers analyze when conducting business. Students will make use of software and tools to characterize and price risk in various activities, carry out basic quantitative risk assessments, and learn what drives success and failure in financial risk management.
Society's growing exposure to the financial risks associated with natural hazards (e.g., flood, drought, extreme temperatures) has made it increasingly important to both accurately quantify these risks and develop innovative strategies for managing them. This course provides exposure to the fundamentals of financial risk management with application to natural hazards an emphasis on developing coupled models that consider natural variability, engineered/managed structures and financial/economic factors. Students will learn to (i) model the financial risk posed by extreme events; (ii) understand the merits of various risk management tools; and (iii) develop effective strategies for managing natural hazard-based financial risk.
Students are introduced to advanced techniques in data sciences, machine learning, and artificial intelligence and their application to the management of financial risks. Students will learn to discover, process, and visualize natural hazard and financial data, and will be taught to quantify various financial risks (e.g., natural hazards) and design management strategies to mitigate negative outcomes. Students will learn basic programming methods and apply data analysis and machine learning techniques to model the complex systems that give rise to risk. Structured case studies and in-class assignments will help students build expertise to be used in longer group projects.
The course focuses on understanding the complex challenges the RMI industry is facing and how to think innovatively from various perspectives to solve them. Many factors are challenging the insurance industry today, including: 1) uncertainty surrounding inflation and interest rates; 2) new regulations to protect customers and avoid systemic risk; 3) consumer demand for radically different customer experiences; and 4) high technology, which is introducing new rules and strengthening competition. Students will develop knowledge and skills to understand how these elements affect the RMI industry, critically analyze them, and develop novel ideas to solve them.
Permission of the instructor for students lacking the requisite. This course is designed to give undergraduate and graduate students exposure to the challenges of managing the financial risk of uncertainty in the value of liabilities and assets within an insurance/reinsurance/risk management context. Students will learn to use actuarial and financial market data to make risk-based decisions related to setting insurance premiums and allocating assets based on estimates of future liabilities (e.g., claims, market losses). The course will introduce students to the properties of different asset classes and hedging tools and demonstrate different methods to assemble portfolios that balance profitability with the risk of insolvency.
This course provides an introduction to the physical processes, statistical methods, and analytical tools used to assess risks that arise from natural hazard events, with an emphasis on extreme precipitation (floods) and hurricanes. Students will examine the dynamics of extreme weather events and hydrologic hazards, learn modern approaches to extreme value analysis, and work with real-world datasets and industry tools to build models of financial risk. Some programming experience is helpful but not required. Previously offered as EMES 414/ENEC 514/GEOL 514.
This course has variable content and may be taken multiple times for credit. Different sections may be taken in the same semester.
Data science B.S. majors only. A signed learning contract is required prior. An experience providing students with practical experience in an organization outside of UNC, coupled with reflective practices during the semester-long experience. The internship should develop and enhance the student's professional skill sets and involve experiences that allow students to have responsibility for results that are of value to the organization. Permission of instructor and director of undergraduate studies required.
For data science majors only and by permission of the instructor. Individual student research for students pursuing an honors thesis in data science under the supervision of a departmental faculty advisor. Majors only.
For data science majors only and by permission of the instructor. Individual student research for students pursuing an honors thesis in data science under the supervision of a departmental faculty advisor. Majors only
Many students entering quantitative graduate programs have encountered key mathematical tools in a mechanical way but without conceptual depth. This course deepens understanding of vector spaces, linear maps, spectral methods, probability, and optimization, all framed through data-relevant examples. The emphasis is on a rigorous treatment, assuming a few standard prerequisites.
Students learn Python 3 syntax and semantics; core data structures; control flow and functions; exceptions and robust file/stream I/O; typing and lightweight data contracts; iterators, generators, and context managers for streaming; and algorithmic thinking with a focus on sorting and tree/graph search basics. Practical engineering skills include Git/GitHub (with preliminary SSH), project structure & packaging, unit testing, documentation, an orientation to IDEs, the Linux command line, and AI-assisted code review and more technical details. The emphasis is writing clear, reliable, and reproducible Python that scales from notebooks to small, well-structured scripts and packages.
This course aims to teach these fundamental statistical concepts and theory underlying data science. We will explore key topics such as descriptive statistics, probability theory, sampling distributions, the central limit theorem, hypothesis testing, regression analysis, and nonparametric methods, along with an introduction to multivariate techniques like principal component analysis (PCA). By building a strong theoretical foundation, we will learn to recognize how these essential ideas underpin even the most advanced methods and algorithms in data science and machine learning.
This course introduces foundational skills to manage and process large-scale, messy data. The course covers key concepts such as ETL data pipelines, cloud computing, database systems, and scalable algorithms. Students will gain hands-on experience using tools like SQL, Python (Pandas), and UNC's Longleaf cluster, while also exploring broader issues like data privacy, access control, and system performance.
This course focuses on the technical and philosophical foundations of designing AI systems aligned with human values. Students will engage deeply with cutting-edge research through critical analysis, examining computational approaches to value learning, preference modeling, and alignment verification alongside ethical frameworks for responsible AI development. The course emphasizes rigorous paper critique, technical depth, and synthesis of alignment research through individual scholarly writing.
This course discusses foundational stages in the data science lifecycle, using a common modern toolkit, and introducing methods and implementations of relational database management systems suited for data science applications. Students gain fluency across the full data science workflow, while also developing data science habits (version control, documentation, critical thinking about data, among others) expected in modern data science projects.
In this course, students will communicate about data science through written and oral communication and through data visualization using traditional and modern communication tools. Students will learn about and practice communicating data insights and technical information appropriately tailored to the audience through compelling narratives and storytelling, and they will create clear, concise, and well-structured written reports and presentations. Students will learn important data visualization principles (e.g., visual perception, design) and apply these principles in creating data visualizations using modern data visualization tools (e.g., Python and/or R). Students will also learn about responsible and ethical practices for communicating data research outputs.
This course is one of three courses designed for the Master of Science in Data Science students to develop the professional competencies needed to succeed in technical, interdisciplinary workplaces. The course emphasizes structured approaches to problem framing and project management that help translate technical expertise into actionable business impact. Students will practice professional communication strategies for explaining technical concepts and will cultivate professionalism and ethical responsibility in both academic and workplace contexts. Designed as a professional development experience, the course focuses on the soft skills, professional judgment, and workplace behaviors that complement technical data science training.
This course is one of three courses designed for the Master of Science in Data Science students to develop the professional competencies needed to succeed in technical, interdisciplinary workplaces. The course emphasizes structured approaches to problem framing and project management that help translate technical expertise into actionable business impact. Students will practice professional communication strategies for explaining technical concepts and will cultivate professionalism and ethical responsibility in both academic and workplace contexts. Designed as a professional development experience, the course focuses on the soft skills, professional judgment, and workplace behaviors that complement technical data science training.
This course is one of three courses designed for the Master of Science in Data Science students to develop the professional competencies needed to succeed in technical, interdisciplinary workplaces. The course emphasizes structured approaches to problem framing and project management that help translate technical expertise into actionable business impact. Students will practice professional communication strategies for explaining technical concepts and will cultivate professionalism and ethical responsibility in both academic and workplace contexts. Designed as a professional development experience, the course focuses on the soft skills, professional judgment, and workplace behaviors that complement technical data science training.
This course explores intermediate-level design and implementation of database systems, emphasizing scalable, distributed systems. Hands-on exercises in the course deepen students' knowledge of advanced relational database management and discuss current and emerging practices for dealing with big data and large-scale database systems. Concepts include design and implementation of relational databases, exploration of distributed data structures including graph, document, and key-value storage models and scalable and resilient query processing.
This course will provide students with advanced concepts on the construction and use of data structures and their associated algorithms. Concepts covered in this course will include: abstract data types, lists, stacks, queues, trees, and graphs; sorting, searching, hashing, and an introduction to numerical error control; techniques of algorithm analysis and problem-solving paradigms using relevant programming languages and tools.
The course coding-oriented course covers the concepts underpinning and the applications of statistical modeling/inference. Students build models with real-world data and modern data science toolkits in Python and R like Scikit-learn and TidyModels. Concepts covered in this course include: Foundations in probability including basic rules, bayes formula, basic distributions; Sampling and the central limit theorem; Bootstrapping, confidence intervals, hypothesis testing, multiple testing; Linear models, basic and multiple regression, inference for regression, regularization; Classification, logistic regression and tree-based methods; Prediction, model interpretation, model evaluation.
This course equips participants with practical tools to estimate causal effects in real-world settings. After building a solid formal foundation, students will learn to design experiments, leverage natural experiments, and analyze observational data using modern causal inference methods. Ideal for those who want to move beyond predictive analytics in order to answer causal questions in their work.
This course explores the foundational concepts of ethics in data science and AI. This overview sets the stage for a deep understanding of what ethical frameworks mean in practice, providing students the opportunity to create actionable examples. By focusing on a wide variety of case studies throughout a myriad of industries and settings, this class develops leaders who can effectively integrate and leverage data science solutions while ensuring responsible and transparent use of data in a variety of roles.
This course presents the mathematical intuition, theory, and techniques driving the numerical computation methods used for processing and analyzing data in various real-life problems. Topics include dimensionality reduction, linear and non-linear approximation, frequency and wavelet analysis, and a glimpse into the mathematics of deep neural networks, classification, large-scale and high-performance numerical computing. Each topic will be motivated by a data analysis challenge, introduce the mathematical intuition, theory and techniques used to address it, and conclude with a coding component with real data.
This course provides students with a foundational understanding of visual perception and data visualization design practices. Students gain expertise on using visualization for tasks such as exploratory analysis and storytelling to support both data-driven discovery and communication. The class focuses on hands-on experiences with commonly used data science tools and technologies.
Graduate students will lead groups of four to five undergraduate students to complete a project for an external client. Clients from an array of industries, as well as those from within UNC, will submit project requests that are feasible within a 12-week semester. The course will offer graduate students an authentic learning experience to develop management skills and professional attributes, such as teamwork, communication, and project management, in preparation for workforce entry.
This course equips students with knowledge of existing tools for predictive analytics and foundational concepts in machine learning. The course covers core principles of machine learning and pattern analysis. Topics include maximum likelihood estimation, regression; classification; cross validation; generalization and overfitting; introduction to neural networks; nonparametric estimators; clustering; tree-based methods; autoencoders; kernel methods. Applications in tabular, image, and textual data for supervised and unsupervised learning tasks are covered.
This course provides students with an in-depth look at deep learning fundamentals and applications with emphasis on their broad applicability to problems across a range of disciplines. Students learn to tackle practical issues that arise during the life cycle of data, both in the cloud and on the Edge. Topics include regularization, optimization, convolutional networks, sequence modeling, generative learning, instance-based learning, and deep reinforcement learning. Students complete several substantive programming assignments using relevant modern tools and frameworks.
This course introduces the design and operation of machine learning systems in cloud environments. Students gain hands-on experience deploying and monitoring models at scale, building data pipelines, and applying distributed computing. Emphasis is placed on leveraging cloud technologies for efficient data handling, AI applications, and end-to-end lifecycle management of ML solutions.
This course prepares data scientists with a strong foundation in machine learning to master the implementation and deployment of advanced AI systems in production environments. Reflecting the rapidly evolving landscape of 2024-2026, students will gain hands-on experience with state-of-the-art LLMs (GPT-4o, Claude 4, Gemini 2.5 Pro), advanced RAG architectures, production-grade agent frameworks, comprehensive security testing, cost optimization strategies, and industry-standard MLOps/LLMOps practices. The course emphasizes security as a foundational requirement, not an afterthought, and prepares students to build, deploy, and maintain production AI systems that are secure, scalable, cost-effective, and ethically sound.
This course introduces students to modern computational approaches for studying biology through DNA and protein sequence data. The course develops a unifying view of sequence count data across gene expression, microbiome studies, and deep mutational scanning. Students explore machine learning methods for protein analysis and design, including protein language models, variant effect prediction, functional annotation, active learning, and generative AI approaches for protein discovery. Throughout the course, emphasis is placed on understanding data-generating processes, statistical structure, modeling assumptions, and practical limitations of biological measurement, enabling students to critically evaluate and apply modern AI methods in biological research.
This course introduces the principles and methods used to analyze data that vary across both space and time. Students first learn the foundations of spatial data analysis, including spatial dependence, nonstationarity, scale effects, and spatial heterogeneity. Techniques for visualizing spatial patterns, quantifying spatial relationships, and building spatially aware models are introduced using modern data science tools. The course also covers time series analysis, focusing on identifying temporal patterns such as trends and seasonality, selecting appropriate forecasting models, and generating predictions at multiple horizons. Through hands-on analysis and professional projects using real-world datasets, students develop practical skills for modeling and interpreting.
This graduate course develops the probabilistic foundations that underlie modern data science, with a strong emphasis on rigor, precision, and critical reasoning. The course is organized around two central themes: (1) high-dimensional probability and concentration phenomena, and (2) stochastic processes and their applications. Throughout the semester, theoretical results are complemented by examples from real-world applications, such as generative AI, population genomics, biological and ecological modeling, and random network analysis, illustrating how probabilistic principles shape contemporary data-driven methodologies.
This course introduces the foundational principles and core algorithms of modern machine learning. Topics include linear and kernel-based models, neural networks, ensemble methods, dimensionality reduction, and clustering. The course also covers key elements of statistical learning theory, including VC dimension, along with modern perspectives such as Gaussian processes and scaling laws. Emphasis is placed on conceptual understanding, mathematical formulation, and the ability to analyze and compare machine learning methods.
This graduate-level course develops the linear algebra and numerical linear algebra needed for modern data science. We treat the subject rigorously, emphasizing precise definitions, theorem-proof development, and careful reasoning about algorithms. Core mathematical themes: vector spaces and duality; inner product spaces and norms; spectral theory; matrix factorizations; perturbation theory; and multilinear (tensor) algebra. Computational themes: numerical stability and conditioning; fast algorithms for structured matrices; iterative methods for large-scale linear systems and eigenproblems; and randomized methods for scalable matrix computations.
The course introduces computational and machine learning methods for analyzing high-dimensional datasets, using omics data as a primary motivating example. Students will explore data acquisition, preprocessing, visualization, and predictive modeling across diverse data types. Through hands-on assignments and a final project, students will learn to select and apply generalizable computational and machine learning methods to extract insights from complex datasets and generate scientifically meaningful hypotheses.
The course goal is to expose graduate students in any UNC department to a broad range of topics in the theory and applications of data science. Students will learn about current and emerging methods and techniques in data science to advance individual research efforts and facilitate inter-disciplinary collaboration. Open to graduate students only and by permission only.
Masters of Applied Data Science (MADS) students are required to complete a capstone project. Capstone projects challenge students to acquire and analyze data to solve real-world problems using techniques they have learned in the program. Project teams consist of three to four students that work together with a project sponsor to create a plan and execute it, culminating in a final presentation and delivery of a demonstrable artifact such as a dashboard or codebase.
