DATA SCIENCE AND SOCIETY (DATA)
Additional Resources
Courses
This course is a broad, high-level survey of the major aspects of data science including ethics, best practices in communication (e.g. data visualization), mathematical/statistical concepts, and computational thinking. Students will gain an understanding of the fundamentals of data science to support more in-depth, advanced coursework that are requirements for the data science majors.
In an era of rapid advancements in data science and AI, ethical concerns related to data-intensive technologies are now of utmost importance. This course immerses students in data science ethics, facilitating a comprehensive exploration of the intricate interplay between data and societal values. By nurturing critical thinking grounded in ethical theories, this course provides students with a strong foundation in designing and analyzing data-intensive ecosystems that emphasize values such as fairness, accountability, ethics, and transparency.
How do you become data literate? Data literacy is the ability to read, write, and communicate data in context, or in other words: perform data analysis, construct a data visualization, and then communicate that data. It is the story that gets told with the data. Data literacy helps us to understand data, learn about different types and scales of data, and understand why this is important in the world today.
Data structures provide a means to manage large amounts of data for use in our databases and indexing services. A data structure is a specialized format for organizing, processing, retrieving and storing data. There are several basic and advanced types of data structures, all designed to arrange data to suit a specific purpose. Data structures make it easy for users to access and work with the data they need in appropriate ways.
The ability to collect and analyze data has changed virtually every field, yet data scientists often lack the ability to present their findings in effective formats. This class uses storytelling to help you connect with your audience and present your data in compelling and understandable ways so stakeholders can make the right decisions with data. Through hands-on exercises, you'll learn the advantages and disadvantages of oral, visual, and written formats.
The first part of this course introduces various stages of the data life cycle, from defining data requirements to data creation and gathering to data fusion and data preparation to data cleaning and quality control to exploratory analytics, data interpretation, and visualization. We will explore FAIR data principles of curation, metadata, and digital preservation policies. The second part will introduce the concept of relational databases that provide storage and management for structured data.
This course will explore intermediate-level design and implementation of database systems, emphasizing scalable, distributed systems. It will deepen students' knowledge of advanced relational database management and discuss current and emerging practices for dealing with big data and large-scale database systems. Concepts include design and implementation of relational databases, exploration of distributed data structures including graph, document, and key-value storage models and scalable and resilient query processing.
This course will provide students with advanced concepts on the construction and use of data structures and their associated algorithms. Concepts covered in this course will include: abstract data types, lists, stacks, queues, trees, and graphs; sorting, searching, hashing, and an introduction to numerical error control; techniques of algorithm analysis and problem-solving paradigms using relevant programming languages and tools.
The course will be coding-oriented and cover concepts such as foundations in probability, including basic rules, Bayes' theorem, and basic distributions; sampling and the central limit theorem; bootstrapping, confidence intervals, hypothesis testing, and multiple testing; linear models, basic and multiple regression, inference for regression, regularization; classification, logistic regression, and tree-based methods; and prediction, model interpretation, and model evaluation.
We will explore the foundational concepts of ethics in data science and AI. This overview will set the stage for a deep understanding of what ethical frameworks mean in practice, providing students the opportunity to create actionable examples. By focusing on a wide variety of case studies throughout a myriad of industries and settings, this class will develop leaders who can effectively integrate and leverage data science solutions while ensuring responsible use of data.
This course will present the mathematical intuition, theory, and techniques driving the numerical computation methods used for processing and analyzing data in various real-life problems. Topics include dimensionality reduction; linear and non-linear approximation; frequency and wavelet analysis; and a glimpse into the mathematics of deep neural networks, classification, large-scale and high-performance numerical computing, and visualization.
This course will provide students with a foundational understanding of visual perceptional and data visualization design practices, provide instruction on using visualization for tasks such as exploratory analysis and storytelling to support both data-driven discovery and communication. The class will focus hands-on experiences with commonly used data science tools and technologies.
This course will be an introductory course to machine learning (ML). The course will cover core principles of artificial intelligence for statistical inference and pattern analysis. Topics will include probability distributions; graphical models; optimization, maximum likelihood estimation, and regression; classification; cross validation; generalization and overfitting; neural networks; nonparametric estimators; clustering; autoencoders; generative models; and kernel methods. Applications in tabular, image, and textual data for supervised and unsupervised learning tasks also will be covered.
The course goal is to expose graduate students in any UNC department to a broad range of topics in the theory and applications of data science. Students will learn about current and emerging methods and techniques in data science to advance individual research efforts and facilitate inter-disciplinary collaboration. Open to graduate students only and by permission only.