The MS program in Engineering Science with a focus on Data Science provides students with a core foundation in big data and analysis by obtaining knowledge, expertise and training in data collection and management, data analytics, scalable data-driven discovery, and fundamental concepts.
For engineering and natural/mathematical science students.
This applied program trains students in the emerging and high demand area of data and computing sciences. In fact, many surveys of employment have highlighted the great need for suitably trained professionals in these areas, estimating deficits of personnel availability in only the US at as high as 150,000 a year.
Students will be trained in sound basic theory with an emphasis on practical aspects of data, computing and analysis. Graduates will be able to serve the analytics needs of employers and will be exposed to several areas of application. The degree can be specialized using electives and a project. Classes will be modestly sized and emphasize best classroom practices while employing online resources to reinforce the classroom experience.
Students in this program will need some prior knowledge of mathematics, statistics and computing (commensurate with that from an engineering/natural science/math undergraduate program, see below for detail). The program can be completed in one calendar year of study.
The University at Buffalo has responded aggressively to these trends by first establishing a doctoral program in Computational and Data Sciences. UB has been a research pioneer in these areas and faculty have much expertise and decades of experience and assets like the world leading Center for Computational Research with unmatched facilities for big computing and data.
Some prior knowledge of mathematics, statistics and computing (commensurate with that from an engineering/natural science/math undergraduate program) is required.
Equivalent of a B average or better in a recognized undergraduate program; GRE: 300+ (waived for recent UB undergraduate students)
Calculus, Multivariate Calculus, Linear Algebra (e.g., UB course MTH 309)
Basic Statistics and Probability
Programming (at least one language - C/C++/Python/Java), Data Structures (e.g., UB course CSE 113)
Course plan for full-time students:
Introduction to Probability Theory for Data Science (3 credits)
EAS 595 (see description below)
Introduction to Numerical Mathematics for Computing and Data Scientists (3 credits)
EAS 596 (see description below)
Statistical Data Mining I (3 credits)
EAS 506 (as of Fall 2018)
Programming and Database Fundamentals for Data Scientists (3 credits)
EAS 503 (see description below)
Statistical Data Mining II (3 credits)
EAS 507 (as of Fall 2018)
Introduction to Machine Learning (3 credits)
Elective 1 (3 credits)
See list below
Data Models Query Language (3 credits)
CSE 560 (as of Fall 2018)
** The Data Science Survey Course will include weekly modules on application-oriented and other relevant topics, including: data science for bioinformatics, data science for health informatics, data science for engineering applications, ethics and privacy, and data science for finance.
*** Students will work with an affiliated faculty member on a Data Science Project. Projects will be sourced from industry where feasible.
This course provides basic background on probability theory at a beginning graduate level. Topics include introductory probability concepts, discrete and continuous random variables and probability distributions, joint probability distributions, random sampling and data description, point estimation of parameters, random variables, derived probability distributions, discrete and continuous transforms and random incidence. As time permits, the course introduces elementary stochastic processes including Bernoulli and Poisson processes.
*Note: New course is being finalized for approval.
The aim of this course is:
*Note: New course is being finalized for approval.
This course presents statistical models for data mining, inference and prediction. The focus will be on supervised learning, which concerns outcome prediction from input data. Students will be introduced to a number of methods for supervised learning, including: linear and logistic regression, shrinkage methods, lasso, partial least squares, tree-based methods, model assessment and selection, model inference and averaging, and neural networks. Computational applications will be presented using R and high dimensional data to reinforce theoretical concepts.
This course introduces students to Computer Science fundamentals for building basic data science applications. The course has two components. The first part introduces students to algorithm design and implementation in a modern, high-level, programming language (currently, Python). It emphasizes problem-solving by abstraction. Topics include data types, variables, expressions, basic imperative programming techniques including assignment, input/output, subprograms, parameters, selection, iteration, Boolean type, and expressions, and the use of aggregate data structures including arrays. Students will also have an introduction to the basics of abstract data types and object-oriented design. The second part introduces students to database design and the use of databases in applications, with a short introduction to the internals of relational database engines. It includes extensive coverage of the relational model, relational algebra, and SQL. Many additional key database topics from the design and application-building perspective are also covered, including indexes, views, transactions, and integrity constraints. There will be a programming project, which explores building an application in the high-level programming language covered in the first part, that includes connecting to a database using an appropriate connector and querying using SQL.
This course presents the topic of data mining from a statistical perspective, with attention directed towards both applied and theoretical considerations. An emphasis will be placed on supervised learning methods. Topics include: linear and logistic regression, discriminant analysis, shrinkage methods, subset selection, dimension reduction techniques, classification and regression trees, ensemble methods, neural networks, deep feedforward networks, and random forests. Model selection and estimation of generalization error will be emphasized.
Involves teaching computer programs to improve their performance through guided training and unguided experience. Takes both symbolic and numerical approaches. Topics include concept learning, decision trees, neural nets, latent variable models, probabilistic inference, time series models, Bayesian learning, sampling methods, computational learning theory, support vector machines, and reinforcement learning.
The course focuses on the issues of data models and query languages that are relevant for building present-day database applications. The following topics are addressed: Entity-Relationship data model, relational data model, relational query languages, object data models, constraints and triggers, XML and Web databases, the basics of indexing and query optimization.
This course will provide students with an overview of data driven analytics in different industry sectors. The class will have a series of visiting lecturers with the faculty member teaching the class providing overview, continuity and grading of homework and term papers.
This course will provide students with a final integrative project experience. The class will require students to obtain an integrative project experience either in industry or at the university. In either case the students will use the skills acquired during the other classes in executing project goals. Students will provide short reports to supervising faculty to ensure that learning objectives are being met.
Two out of the following courses can be selected as electives.
EAS XXX (new class) Exploratory Data Analysis and Visualization
CSE 535 Information Retrieval
CSE 562 Database Systems
CSE 573 Computer Vision
CSE 586 Large-Scale Distributed Systems
CSE 587 Data Intensive Computing
CSE 601 Data Mining for Bioinformatics
CSE 610 Deep Learning
CSE 633 Parallel Algorithms
CSE 635 Multimedia Information Retrieval
CSE 636 Data Integration
CSE 676 Deep Learning
**Students must have successfully completed CSE 574 before taking CSE 676. Cannot be taken in the same semester as CSE 574.
CSE 740 Machine Learning and Big Data
CSE 674 Advanced Machine Learning
STA 517 Categorical Data Analysis
STA 546 Statistical Data Mining II
STA 567 Bayesian Statistics
MAE XXX (new class) Simulation Analytics
MAE XXX (new class) Data in Manufacturing
MAE 609 High Performance Computing
IE 575 Stochastic Methods
IE 535 Human Computer Interaction
EE 634 Principles of Information Theory and Coding
MTH 558/559 Mathematical Finance