CDSE Days 2020 is cancelled

Due to caution regarding COVID-19 and the safety of our students, staff, faculty, speakers, and attendees, CDSE Days 2020 is cancelled. Please continue to visit the UB COVID-19 Information website for updates and important information related to this evolving situation.

6th Annual CDSE Days

March 30 - April 3, 2020 | University at Buffalo | Buffalo, NY

researchers in the center for computational research.
student working on computer.
center for computational research.
researchers in the center for computational research.

Overview of CDSE Days 2020

CDSE Days is an annual event that connects students and faculty from UB and regional colleges and universities, and professionals from local industry, with some of the nation's most prominent scholars of computational and data-enabled science. CDSE Days is a week-long event in Buffalo filled with workshops, lectures, and networking.

The CDSE Days 2020 agenda includes research talks on timely subjects such as the opioid epidemic, polling data analysis, and high-performance computing in the chemical sciences. Instructional workshops will also be conducted on python, agent-based computational models (ABM), blockchain, SQL, machine learning, and using UB’s Center for Computational Research (CCR). The week will end with TED-style talks from industry partners and sessions on career opportunities for computational scientists. 

Speakers Abstracts Poster Sessions

Abstracts

Python and Jupyter for Programmers

Bernard Badzioch

A hands-on introduction to Python and the Jupyter Notebook for those with experience programming in other languages. Participants must bring a laptop with an up-to-date installation of the Anaconda distribution of Python 3. More information about the workshop and software installation can be found on the CDSE Days 2020 Python Workshop Website: https://cdse2020.github.io/

Agent Based Modeling Tutorial and Workshop

Sara Metcalf

This workshop will introduce participants to the practice of agent-based modeling for dynamic simulation, demonstrate a range of examples, and guide participants through the process of developing an agent-based model. Agent-based models facilitate dynamic simulation of dynamic multi-scalar feedback mechanisms and interactions between heterogeneous individual agents and their environments. Agents may represent people, animals, organizations, or other kinds of discrete decision-making entities.

Participants should bring a laptop computer with the free AnyLogic PLE (Personal Learning Edition) software installed. AnyLogic is a multi-method Java-based software platform that can be downloaded here: https://www.anylogic.com/downloads/

Predictive Multiscale Agent-Based Model of Tumor Growth

Danial Faghihi Shahrestani

The recent progressions in computational and data science have enabled the scientific community to push forward the model prediction of multiscale/multiphysics phenomena arise across science, engineering, and medicine. This results in the recognition that successful predictive models to support high consequence decisions (e.g., patient-specific cancer treatment), cannot arise from sole applications of data analytics. A host of new technologies must be developed to harness the incredible predictive power of physics-based models, along with learning from uncertain data to make a reliable computational prediction.

This talk is centered around developing a predictive hybrid multiscale agent-based models (ABMs) of tumor development. We develop a new ABM that is able to simulate combinations of several multiscale processes involved in the dynamics of tumor development. The high computational cost of simulating discrete cells, inherent stochasticity due to the probabilistic phenotypic transitions of tumor cells, and the presence of numerous and uncertain model parameters, are the main drawbacks restricting the ABMs to predict the cancer growth. We explore the application of a physics-based machine learning method to train and validate the ABM against a set of in vitro measurements of human breast cancer. The method allows quantification of uncertainties in data and model parameters and assessing the predictive capability of the ABM.

Computational Modeling of Drug Overdose Deaths

Donald S. Burke

For almost four decades, the curve of drug overdose deaths in the USA has tracked along a remarkably predictable exponential growth trajectory. To understand the mechanisms driving this sustained pattern of growth – so as to be able to implement more effective epidemic interventions -  we conducted analytical and simulation modeling of the epidemic. Analysis by year of birth and age at death disclosed substantial epidemic structure, with a sharp emergence of risk in the cohorts born immediately after World War II, an inexorable youth-ward shift in the age at death for all subsequent birth cohorts, with another intensification in younger generations.  Patterns have also varied systematically according to demographic factors (sex, race, urbanicity) and geography and with the introduction of new drugs. Novel methods of data visualization will be used to show and explain these epidemiological patterns, and progress toward development of computational simulations of the epidemic will be discussed.

Research Computing and Data Science at UB: Center for Computational Research

Matthew Jones

One of the core missions of the Center for Computational Research (CCR) at UB is to enable research and scholarship at UB by providing UB researchers and educators with access to high-performance computing, data, and visualization resources, in conjunction with a wide range of guidance and services to facilitate faculty led research including software development, data analytics, and parallel computing. For this half day session, CCR staff (along with Prof. Sal Rappocio of the Physics Department) will provide an overview of these resources and services as well as a series of talks/discussions of selected topics of broad interest to the CDSE community.

Agenda:

9:00-9:30 CCR Mission/Engagement Overview/Update

9:30-10:00 CCR Resources Overview

10:00-10:30 Maximizing Slurm usage with scavenger

10:30-11:00 OpenStack private cloud service demo

11:00-11:30 Hub-based instruction and tools with VIDIA/R examples

11:30-12:00 Applications to computational physics (VIDIA via python/jupyter)

Using Git, LaTeX, and Make for Research Projects

Matthew Knepley

Computational science is now dependent on a shared software infrastructure which enables the use of cutting edge hardware, optimal algorithms, and sophisticated data analysis and visualization strategies. In this tutorial, we will enable students to use the Git system for version control (including a useful policy layer), configuration and build tools, and the LaTeX documentation system.

Machine Learning on Ground Vibrations

Tolulope Olugboji

The science of using ground-vibrations recorded on seismic sensors distributed across the globe allows us to (1) build maps of earth’s interior, and (2) figure out if destructive shaking, recorded by these sensors, is due to anthropogenic or natural causes. These are two topics, amongst others, that my research group investigates. We apply techniques in machine learning to facilitate the process of extracting useful information from ground vibration data, which leads to insights on Earth’s state, dynamics, and hazards. In this presentation, I describe how machine learning (ML) is transforming our science. I introduce how we use ML to facilitate the rapid extraction of Earth’s elastic response by reconstructing waves buried within continuous ground vibration - i.e., ambient noise. I also explain how we apply probabilistic inverse theory to construct auto-adaptive maps of Earth’s near-surface structure (i.e., ML applied to the problem of model representation).

Fast Solvers: Algorithms and Applications in Science and Engineering

Mark F. Adams

A goal of this talk is to provide a view of the field of high-performance computing (HPC) with experience in developing multigrid solvers, general algorithms and specific applications, within the Department of Energy (DOE), for the scientific computing community via the PETSc (Portable extensible toolkit for scientific computing) library. The principles of traditional fast (geometric) multigrid solvers will be introduced with a new HPC benchmark – HPGMG – and an advanced, low communication, multigrid algorithm for extreme scale – segmental refinement – will be discussed. Application of algebraic multigrid methods in PETSc to applications, such as bones, tires, ice, and fusion plasmas, both within and outside of DOE will be presented. The PETSc numerical library framework and community will be introduced as well as the DOE software funding model and the future directions of the HPC software stack.

Elementary Machine Learning

Kenny Davila

An introductory lecture that will cover the basic theoretical concepts of Machine Learning for non-experts. The latest version of the Anaconda distribution of Python 3 will be required.

SQL Analytical Functions

Pavan Mulgund

Analytical functions in SQL can take you beyond traditional approaches of querying data to business intelligence. They make complex post-query processing simpler by using clearer and more concise code. Other advantages of the analytical SQL functions include improved query speed, manageability of the code and minimized learning effort. A basic understanding and hands on knowledge of SQL queries is a pre-requisite to learning SQL Analytical functions.

Some of the most widely used analytical functions in SQL would include the following.

  • Basic concepts in Analytical functions: PARTITION BY, ORDER BY, Windows functions (RANGE, INTERVAL DAY/MONTH/YEAR, UNBOUNDED, PRECEDING, FOLLOWING, CURRENT ROW)
  • Ranking of data, using ROW_NUMBER, RANK, DENSERANK, LAG/LEAD, RATIO_TO_REPORT
  • Advanced functions for pivoting and Unpivoting Data (PIVOT/UNPIVOT), filtering top-N results (TOP-N) and aggregating hierarchical data (CUBE/ROLLUP/GROUPING SETS clauses)
  • Statistical/Linear Regression Functions

Apart from learning the definition of the above functions and where these can be used, it is equally important to use them in practice to understand their application.

Prerequisite – Knowledge of basic SQL including joins, subqueries.

What 1936 Can Teach us about 2016 (and possibly 2020): Poll Aggregation, the Limits of Big Data, and Another Look at Nonresponse Bias

Jacob Neiheisel

Public opinion polling’s reputation took a sizable hit in the wake of the 2016 elections, as statewide polls in key battleground states ended up understating popular support for Donald Trump. Forecasting models that relied heavily on national polls similarly missed the mark. In response to these apparent shortcomings, the American Association for Public Opinion Research (AAPOR) conducted a thorough post-mortem review of what went wrong with the polls in 2016. In this talk I provide an overview of the AAPOR report and situate its findings within the context of a similar such effort written in response to another (in)famous polling disaster—the 1936 Literary Digest Poll. I also take this opportunity to provide a preliminary analysis of nonresponse bias attributable to differential levels of distrust in the media—a potential contributor to polling errors in 2016 that went largely overlooked by the AAPOR report. Finally, I touch upon issues related to poll aggregation of the type popularized by sites like FiveThirtyEight and Pollster.com, incorporating this discussion into a broader conversation of the limits of big data approaches to measuring public opinion.

Unpacking irrationality and volatility in Canadian Federal Elections

NIK NANOS

The small margins that cause important shifts in election outcomes reveal the fragility of contemporary democracies. This talk takes Canadian Federal Elections as a case study for exploring the rationality and irrationality of citizens in the decision-making process, given external influences such as access to information and globalization.

The Impact of Changing Technology on Survey Research and the Implications for Social Science Research

Andrew Smith

Since its inception in the early years of the 20th century, survey research has a long history of adapting standard procedures to changes in both theory and technology.  Since most social science survey research involves assessment of attitudes rather than predictions of future behavior, election polling has been one of the few topics that allows researchers to compare survey predictions with actual outcomes. Methodological adaptations have occurred largely because of the failure of prevailing methods to accurately predict election outcomes.  The shift from large convenience samples in straw polls to quota samples occurred after the failure of the Literary Digest straw poll to predict the 1936 election while Gallup, Roper, and Crossley correctly predicted the winner. A similar methodological change, from quota sampling to probability sampling, occurred after the failure of polls to predict the 1948 election. Over this same period, the widespread adoption of telephones in the US forced researchers to largely abandon face-to-face data collection and use telephones as the primary means of data collection.

In the wake of the perceived failure of polls to predict the 2016 presidential election, the field of survey research is in the midst of a paradigm shift that is both theoretical and technological.  Lower response rates to telephone surveys have made them cost prohibitive for most media organizations and over the past 20 years there has been an increased use of low-cost web based surveys. Furthermore, the rise of “big data” analysis for consumer research has caused survey researchers to reexamine and change their methods once more.  This paper examines this history and explores the implications of the current paradigm shift for the future of social science research.

TED-Style Talks abstracts

Topic for all discussions: Data Science for Businesses

Larry Megan

The profound impacts of Data science and Artificial Intelligence are not limited to just 21st century digital companies. Linde’s history is that of a traditional a 110 year old manufacturing company, yet data driven digitalization is a key part of the company’s strategy. In industrial companies, technology alone will not help the company prosper. Rather, there needs to be a deep recognition that such transformations are less about technology and more about changing how people work.  So this talk will focus on the people on the front lines of the business and, instead of simply discussing data science applications, will also discuss how creating a positive human experience is essential to success.

Event Start Date: March 30, 2020