The Master's of Professional Studies in Data Sciences and Applications program will train students in analytics, including standard methods in data mining and machine learning, so they will possess the expertise to obtain insights from large and heterogeneous data sets.
Students will learn data management and manipulation such as database management, distributed and big data management, and cloud based methodologies.
Students from all majors interested in data sciences and applications skills are encouraged to apply.
Rachael Hageman Blair
709 Kimball Tower
This program was created in consultation with companies such as IBM and HP, Sentient Science, Calspan, M&T, and Moog, who provided input on the skills that they see as difficult to find within current hiring pools and that they anticipate will be needed in the future.
In fact, the McKinsey Global Institute estimates that the job market will need an additional 140,000–190,000 trained personnel for “deep analytical talent positions” and 1.5 million more “data-savvy managers” to take full advantage of big data in the United States. A recent New York Times article writes “Universities can hardly turn out data scientists fast enough.” It is estimated that the national shortage for such talent is at least 60%.
This Master of Professional Studies degree is skills-oriented and provides training in the practice of data, computing and analysis. Students will need some prior knowledge of mathematics, statistics and computing, and bridge classes are available to prepare students for success in the program. In particular, we are interested in students from non-traditional backgrounds with an interest in and need for the skills that are the focus of this program.
The program can be completed in one calendar year of study in an intensive program or a more standard 4 semesters of full time study.
Upon successful completion of the MPS degree, students will be expected to be able to:
Note: An asterick (*) indictates a new course that is being finalized for approval.
CDA 501/EAS 503 Introduction to Data Driven Analysis
This course introduces students to computer science fundamentals for building basic data science applications. The course has two components. The first part introduces students to algorithm design and implementation in a modern, high-level, programming language (currently, Python). It emphasizes problem-solving by abstraction. Topics include data types, variables, expressions, basic imperative programming techniques including assignment, input/output, subprograms, parameters, selection, iteration, Boolean type, and expressions, and the use of aggregate data structures including arrays. Students will also have an introduction to the basics of abstract data types and object-oriented design. The second part covers regression analysis and introduction to linear models. Topics include multiple regression, analysis of covariance, least square means, logistic regression, and nonlinear regression. The students learn to implement the regression models as a computer program and use the developed application to analyze synthetic and real world data sets.
CDA 502/MGS 613 Database Management Systems
This course provides basic understanding of relational databases including normalization, database schemas and relational algebra, create, update, query and delete tables using standard SQL statements, understand workflows such as ETL (extract, transform, and load) to aggregate data from multiple sources integrating it in databases and data warehouses use, manage and customize NoSQL databases including key value, wide column, document and graph stores as well as their application on non-tabular data, use, manage and customize graph databases and apply them to multi-dimensional datasets.
CDA 511 Introduction to Numerical Analysis
A first course on the design and implementation of numerical methods to solve the most common types of problem arising in science and engineering. Most such problems cannot be solved in terms of a closed analytical formula, but many can be handled with numerical methods learned in this course. Topics for the two semesters include: how a computer does arithmetic, solving systems of simultaneous linear or nonlinear equations, finding eigenvalues and eigenvectors of (large) matrices, minimizing a function of many variables, fitting smooth functions to data points (interpolation and regression), computing integrals, solving ordinary differential equations (initial and boundary value problems), and solving partial differential equations of elliptic, parabolic, and hyperbolic types. We study how and why numerical methods work, and also their errors and limitations. Students gain practical experience through course projects that entail writing computer programs.
CDA 531/MTH 511 Probability and Data Analysis
Topics include: review of probability, conditional probability, Bayes' Theorem; random variables and distributions; expectation and properties; covariance, correlation, and conditional expectation; special distributions; Central Limit Theorem and applications; estimations, including Bayes; estimators, maximum likelihood estimators, and their properties. Includes use of sufficient statistics to 'improve' estimators, distribution of estimators, unbiasedness, hypothesis testing, linear statistical models, and statistical inference from the Bayesian point of view.
CDA 541/STA 545 Statistical Data Mining 1
This course presents statistical models for data mining, inference and prediction. The focus will be on supervised learning, which concerns outcome prediction from input data. Students will be introduced to a number of methods for supervised learning, including: linear and logistic regression, shrinkage methods, lasso, partial least squares, tree-based methods, model assessment and selection, model inference and averaging, and neural networks. Computational applications will be presented using R and high dimensional data to reinforce theoretical concepts.
CDA 546/STA 546 Statistical Data Mining 2
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on unsupervised learning methods, especially those designed to discover and exploit hidden structures in high-dimensional data. Topics include: hierarchical and center based clustering, principal component analysis, data visualization, random forests, directed and undirected graphical models, and special considerations when n>>p. Computational applications to high-dimensional data will be presented using Matlab and R to illustrate methods and concepts.
CDA 542/CSE 574 Machine Learning
Humans have an uncanny ability to learn from their mistakes and adapt to new environments by relying on their past experience. Machine learning focuses on "How to write a computer program than can improve performance through experience?" Machine learning has a huge number of practical applications, more so in the present era of Big Data, where staggering volumes of diverse data in almost every facet of society, science, engineering, and commerce, are presenting opportunities for valuable discoveries. For example, machine learning is being used to understand financial markets, impact of climate change on society, protein-protein interactions, diseases, etc. Machine learning also has far ranging applications such as self-driving cars to never ending language learning systems. This course will focus on understanding the mathematical and statistical foundations of machine learning. We will also cover the core set of techniques and algorithms needed to understand the practical applications of machine learning. The course will be an integrated view of machine learning, statistics (classical and Bayesian), data mining, and information theory. A basic understanding of probability, statistics, algorithms, and linear algebra is expected. Familiarity with Python is required for homework assignments and for understanding in-class demonstrations.
CDA 551/MGS 639 Cybersecurity Privacy and Ethics
Present-day terms, philosophies, technologies, and strategies that go into buttressing an organization’s cybersecurity posture. Managing the resources of a corporate information assurance program, while continually improving a risk footprint and response, is an underpinning of all topics that will be covered. Students will critically examine concepts such as networking, system administration, and system security as well as identifying and applying basic security hardening techniques. Students will gain practical experience through a virtualized lab environment where they will build and secure a small corporate network.
*CDA 561 Major Applications - Health, Social, Finance, Science and Engineering
This course will provide students with an overview of data driven analytics in different industry sectors. The class will have a series of visiting lecturers with the faculty member teaching the class providing overview, continuity and grading of homework and term papers.
CDA 571/ Project Guidance
This course will provide students with a final integrative project experience. The class will require students to obtain an integrative project experience either in industry or at the university. In either case the students will use the skills acquired during the other classes in executing project goals. Students will provide short reports to supervising faculty to ensure that learning objectives are being met.