2018 CDSE Days

April 9 - 13, 2018 | University at Buffalo | Buffalo, NY

Big data text against close up human eye

Bringing together some of the nation's most prominent scholars of data-enabled science to Buffalo for a week of workshops, lectures and networking. 


Program Schedule

Monday, April 9th

8:30 am - 12:00 pm | Workshop - Python and Jupyter for Programmers | Student Union 201

Presented by: John Ringland
Format: Workshop
A hands-on introduction to Python and the Jupyter Notebook for those with experience programming in other languages. The instructor will show a wide variety of applications. Participants must bring a laptop with an up-to-date installation of the Anaconda distribution of Python 3.

1:00 pm - 3:00 pm | Workshop - Discussing Data: Presenting Technical Information to a General Audience | Jacobs Hall 140 & 146

Presented by: Thealexa Becker
Format: Workshop
Working with large or complex data presents numerous challenges before analysis even begins. How can we become fluent about the data being used? How can we curate our data for easier use and understanding by other researchers? And most importantly, how can we effectively communicate essential information about our data to a general audience? This presentation will discuss these considerations in the context of the Federal Reserve Bank of Kansas City’s Data Museum project.

3:30 pm -4:30 pm | Data science with multilayer networks: Mathematical foundations and applications | Davis Hall 101

Presented by: Dane Taylor
Pre-Keynote Address
Complex networks are a natural representation for datasets describing biological, social and information systems, and it is common practice to gain insights by studying structural patterns in these networks. Two popular examples include centrality analysis (whereby one ranks different parts of the network according to their relative importance) and community detection (whereby one seeks to find clusters). In this talk, I will discuss extensions of these pursuits for multilayer networks that consist of network layers encoding different types of connections, such as categorical social ties (friendships, colleagues, etc.) or a network at different instances in time. I will introduce new methodologies and explore their application to diverse datasets such as the United States Ph.D. exchange in mathematics, co-starring relationships among top-billed actors during the Golden Age of Hollywood, citations between decisions from the United States Supreme Court, and data-fusion for the Human Microbiome Project. I will also highlight how mathematical theory development can improve our understanding of these endeavors, which helps close the gap between the popular heuristics in place and the development of theory-supported methods derived from first principles in mathematics and statistics.

4:30 pm - 5:30 pm | Keynote Speaker: Lang Li | Translational Pharmacoinformatics: the convergence of innovations and beyond | Davis Hall 101

Presented by: Lang Li
Keynote Address
Pharmacoinformatics drives a new era of translational biomedical research. It translates drug effects from epidemiological discoveries in the health record data to their pharmacology mechanisms through pre-clinical experiments and vice versa. Our pioneer pharmacoinformatics research successfully tested and validated that simvastatin and loratadine interaction had increased myopathy risk in both Indiana Network of Patient Care (INPC) database and Federal Adverse Events Reporting System (FAERS). We further demonstrated that their pharmacodynamics interaction mechanism in the rat myocytes. Recently, we successfully validated the epidemiological evidence for an increased risk of myopathy via the three-drug interaction among omeprazole, fluconazole, and clonidine. We then corroborated the pharmacokinetics model by showing that the tri-drug interaction is due to the increased omeprazole drug exposure through inhibiting its CYP3A and CYP2C19 metabolism pathways by fluconazole and clonidine. Using these examples, we will demonstrate that the innovative pharmacoinformatics research is built upon the synergies among informatics, pharmaco-epidemiology, statistics, computer science, pharmacometrics, and system pharmacology. Dr Li will also highlight some new pharmacoinformatics research areas, and their translational impacts.

Tuesday, April 10th

9:00 am - 12:00 pm | Workshop - Case studies with MRGsolve: PBPK and QSP model implementation and utilization in R | Kapoor Hall 125

Presented by: Matthew Riggs and Kyle Baron
Format: Workshop
This workshop will focus on (1) MRGsolve overview and brief tutorial; (2)PBPK model characterizing drug-drug interactions between HMG-CoA reductase inhibitors and cyclosporine (Yoshikado et al. 2016; PMID 27170342) including  sensitivity analyses and parameter estimation; and (3) Evaluation of combination chemotherapy regimens using a QSP model for MAP kinase signaling in colorectal cancer (Kirouac et al. 2017; PMID 28649441) including creating complicated input data sets and options for parallelizing simulations.

1:00 pm - 3:00 pm | Workshop - SQL Fundamentals | Jacobs Hall 140 &146

Presented by: Joana Gaia
Format: Workshop
As data collection has increased exponentially, so has the need for people skilled at using and interacting with data.  Some of the skills needed to interact with data include being able to communicate with database systems. Database systems use their own language to communicate: SQL. This course is designed to give an overview of the fundamentals of SQL and working with data. This course starts with the basics and assumes no prior knowledge or skills in SQL. In addition to setting this foundation, this workshop will also discuss methods to create tables, move data into them, filter that data or combine it with even more data.

3:30 pm - 4:00 pm | Dense subgraphs with hierarchical relations: Models, Algorithms, Applications | Davis Hall 101

Presented by: Erdem Sariyuce
Format: Presentation
Finding dense substructures in a network is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective and only find a few such subgraphs without providing any structural relations. In this talk, I will talk about a framework that we designed to find dense regions of the graph with hierarchical relations. Our model can summarize the graph as a tree of subgraphs. In each subgraph, smaller cliques are present in many larger cliques. With the right parameters, our framework generalizes the classic notions of k-core and k-truss decompositions, two widely accepted dense subgraph models. We present practical algorithms for our framework and empirically evaluate their behavior in a variety of real graphs. Furthermore, we adapt our framework for bipartite graphs which are used to model group relationships such as author-paper, word-document and user-product data. Our experiments show that we can identify dense structures that are lost within larger structures by other methods and find further finer grain structure within dense groups. We demonstrate how proposed algorithms can be utilized on the analysis of a citation network between physics papers, author-paper network of database conferences, and user-product network of the Amazon Kindle books.

4:00 pm - 4:50 pm | Keynote Speaker: Roger Ghanem | Statistical Sampling on Manifolds for Expensive Computational Models | Davis Hall 101

Presented by: Roger Ghanem
Keynote Address
- refreshments served
Increasingly, critical decisions are demanded for situations where likelihoods are not sufficiently constrained by models.  This could be caused by the lack of suitable mathematical models or the inability to compute the behavior of these models, or observe the associated physical phenomena, under a sufficient number of operating conditions. In many of these situations, the criticality of the decisions is manifested by the need to make inferences on high consequence events, which are typically rare. The setting is thus one of characterizing extreme events when useful models are lacking,  computational models are expensive, or empirical evidence is sparse. We have found adaptation and learning to provide transformative capabilities in all of these settings. A key observation is that models and parameters are typically associated with comprehensive constraints that impose conservation laws over space and time, whose solution yields spatio-temporal fields, and that require comprehensive calibration with exhaustive data. Decisions typically depend on quantities of interest (QoI) that are agnostic to this complexity and that are constructed through an aggregation process over space, time, or behaviors. A regularization is thus imposed by allowing the QoIs to drive the complexity of the problem. But then one has to learn the QoIs. This talk will describe recent procedures for probabilistic learning of QoIs on diffusion manifolds.  TRhe method is demonstrated  to problems in science and engineering  where models are either too expensive to compute or too inconclusive to provide acceptable interpolation to data. Probabilistic inferences are then possible as required by risk assessment and probabilistic-based design.

Wednesday, April 11th

9:00 am - 10:30 am | Workshop - CCR OnDemand: UB's One Stop Shop for High Performance Computing | Student Union 330

Presented by: Shawn Matott
Format: Workshop
CCR OnDemand is a convenient web portal for accessing all aspects of UB's high performance computing resources. The easy-to-use interface allows users to upload and download files, create, edit, submit, and monitor jobs, run GUI applications, and connect via SSH. And this can all be done via a web broswer, with no client software to install and configure. The presentation will demonstrate some of the unique features of CCR OnDemand and compare it with  alternative approaches that can require installing and using a variety of disparate programs like Putty, FileZilla, XMing, VNC, etc. Participants who wish to follow along with the workshop demos should bring a laptop and register for a CCR user account - see ccr.buffalo.edu/support/ccr-help/accounts.html

10:30 am - 12:30 pm | Workshop - Immersive Visual Data Analysis | Student Union 330

Presented by: Oliver Kreylos
Format: Workshop
Immersive visualization using virtual reality (VR) display technology offers tremendous benefits for the visual analysis of complex three-dimensional data like those commonly obtained from geophysical and geological observations and models. Unlike "traditional" visualization, which has to project 3D data onto a 2D screen for display, VR can side-step this projection and display 3D data directly, in a pseudo-holographic (head-tracked stereoscopic) form, and does therefore not suffer the distortions of relative positions, sizes, distances, and angles that are inherent in 2D projection. As a result, researchers can apply their spatial reasoning skills to virtual data in the same way they can to real objects or environments.

This workshop will present VR methods for data analysis that have been developed at the UC Davis W.M. Keck Center for Active Visualization in the Earth Sciences (KeckCAVES), and will focus on low-cost commodity VR display systems such as the HTC Vive VR headset.

1:00 pm - 3:00 pm | Workshop - LaTeX, Github and More | Student Union 330

Presented by: Paul Bauman and Matt Knepley
Format: Workshop
Computational science is now dependent on a shared software infrastructure which enables the use of cutting edge hardware, optimal algorithms, and sophisticated data analysis and visualization strategies. In this tutorial, we will enable students to use the Git system for version control (including a useful policy layer), configuration and build tools, and the LaTeX documentation system

Thursday, April 12th

11:30 am - 12:30 pm | The Role of Spreadsheet Software in the Current Academic and Non-Academic Environments: A First Line of Defense Against the Deluge of Big, Fast, Varied, Dirty and Valuable Data | Student Union 201

Presented by: Joaquin Carbonara
Format: Presentation
Data is hailed to be the new oil. Data, raw or pre-processed, is already the media format of choice for low to high-range quants and even non-quantitative information workers. Most information is or will be available in the very near future in the form of data, some of it stored as tables that can be viewed in a spreadsheet format. In addition, data is becoming a game changer in the consumer market. A revolution paralleling in size but growing much faster than the Internet revolution of the early nineties is happening now. In fact, digital data is already a part of mostly everyone’s life at work and at home. This presentation will highlight and give examples of the relevance and shortcomings of spreadsheets as part of the data information revolution. 

1:00 pm - 3:00 pm | Workshop - Blockchain Programming: A Hands-on Tutorial | Student Union 201

Presented by: Bina Ramamurthy
Format: Workshop
We will work on problem solving and programming on the Ethereum blockchain. This tutorial will focus on (i) Smart contract development using Solidity language and on Remix Integrated Development Environment (IDE) and (ii) Decentralized Application (Dapp) development using Truffle IDE. You will work on a virtual machine image preloaded with the required software. Remix is a web IDE.

Prerequisites: Basic knowledge of an object-oriented language or any high level language coding skills.

Hardware: A laptop with at least 4GB RAM, if it is lower, things will execute slower. Please install virtual box prior to the workshop.

3:30 pm - 4:00 pm | Massive Mountains, Outlier Volcanoes, and Large Earthquakes: Using High Performance Computing and Immersive 3D Visualization for Scientific Discovery | Davis Hall 101

Presented by: Margarete Jadamec
Format: Presentation

In the Earth Sciences, the modeling of tectonic plate boundaries has traditionally been approached through parameter sweeps on generalized problems or global representations at coarse resolution. Advances in high performance computing and access to large data volumes define a new computational landscape, within which the next generation of fluid dynamics simulations of solid state deformation in the Earth can be both formulated and run. Here I present the first three-dimensional configuration of the Alaska tectonic plate boundary, the site of the tallest mountain in North America (Denali), the second largest recorded earthquake (Great Alaska Earthquake), and anomalous volcanic edifices (the Wrangell volcanics). The data-assimilated three-dimensional configuration is used in large-scale numerical simulations of non-linear viscous flow of the Alaska plate boundary, comprised of over 400 million unknowns and over 20,000 CPU hours per simulation. By constructing the detailed model of this kind, we provide the first comprehensive model for the mechanisms generating Denali, the Great Alaska Earthquake, and the Wrangell volcanics, explained in a self-consistent model. As we move into the frontier of Exascale computing, 3D immersive visualization plays an increasingly important role. Examples of the critical role 3D immersive virtual reality can play in leveraging high-performance computing for scientific discovery are also shown.

4:00 pm - 4:50 pm | Keynote Speaker: Robert Harrison | A Sustainable Model for Scientific Simulation in the Exascale Era | Davis Hall 101

Presented by: Robert Harrison
Keynote Address -
refreshments served
As we progress towards and beyond exa-scale computation, disruptive changes are causing many people to question whether our current approaches to developing software for science and engineering are sustainable.  In particular, can we deliver to the world the full benefits expected from high-performance simulation?  Or is innovative science being stifled by the increasing complexities of all aspects of our problem space (rapidly changing hardware, software, multidisciplinary physics, etc.)?

Previously successful strategies for maintaining productivity and performance, such as frameworks and expert-written libraries, have been largely destroyed by the disruptive pace of change in architecture and programming models, which will continue and even accelerate for the foreseeable future.  Focusing on applications in chemistry, I will discuss many of these issues including how chemistry has already been forced to adopt solutions that differ quite sharply to those in the mainstream, and how these solutions might position us well for the technology transitions now under way.  Other disciplines have also developed relevant tools and (partial) solutions that we can adopt or specialize.  More radical changes in how we compute, going all the way back to the underlying numerical representation and algorithms used for the simulation, also promise great enhancements to both developer productivity and the accuracy of simulations.  Finally, a communal approach is necessary for a truly sustainable solution and I will discuss relevant plans of the Molecular Science Software Institute (http://molssi.org).

4:50 pm - 6:30 pm | Poster Session and Reception | Davis Hall 1st Floor Atrium

Hosted by: Abani Patra, Director of the Institute for Computational and Data Sciences (ICDS)
Refreshments, posters, prizes and networking

Friday, April 13th

9:00 pm - 9:30 pm | Student Modeling for Learning Curve Estimation | Student Union 330

Presented by: Alex Nikolaev
Pre-Keynote Address
In this talk, new methods are presented for student performance prediction, based on the Probabilistic Tensor Factorization approach. The key challenge of modeling and predicting students’ performance lies in the estimation of their conceptual understanding while recognizing both the temporal dynamics of knowledge acquisition and the differences in individual learning abilities. In tackling this challenge, the designed models lead to Likelihood Maximization formulations. The resulting non-convex, constrained optimization problems are solved by the Block Coordinate Descent and Alternating Direction Method of Multipliers algorithms. The inferred parameters are well-interpretable, and hence, can help inform decision-making for intelligent tutoring.

9:30 am - 10:30 am | Keynote Speaker: Shi Jin | Uncertainty Quantification in Kinetic Theory | Student Union 330

Presented by: Shi Jin
Keynote Address
Kinetic equations describe dynamics of probability density distributions of large number of particles, and have classical applications in rarified gas, plasma, nuclear engineering and emerging ones in biological and social sciences. Since they are not first principle equations, rather are usually derived from mean field approximations of Newton's second law, thus contain inevitably uncertainties in collision kernel, scattering coefficients, initial and boundary data, forcing and source terms.  n this talk we will review a few recent results for kinetic equations with random uncertainties.  We will extend hypocoercivity theory, developed for deterministic kinetic equations, to study local sensitivity, regularity, local time behavior of the solutions in the random space, and also establish corresponding theory for their numerical approximations.

10:30 am - 4:00 pm | UB Symposium on Job and Career Perspectives for Students in the Computational Sciences | Student Union 330

The symposium will address the questions of students who work (or plan to work) in the computational research groups at UB regarding their career prospects once they conclude their university education. Every year, five speakers are invited to this full-day event, and their combined experience has so far covered jobs at BASF, BigDataBio, Bosch, Cryos Net Consultancy, Dow Chemical, Eastman Kodak, HGST/Western Digital, IBM, Kitware, NASA, Nexight Group, NIH, NIST, NREL, Pfizer, PNNL, Q-Chem, Schlumberger/OneSubsea, and Wiley. Speakers have typically conducted computational research during graduate school (including at UB) and then transitioned into employment outside the university domain. They share their experience of finding jobs coming from a computational background, their move from academia to their new environment, and their thoughts on the computational R&D landscape in the industry. They report on the job situation in their chosen profession, give insights into the nature of their work, and talk about opportunities with their specific employers. Each presentation is followed by a Q&A session and we hold a joint panel discussion with all speakers at the end of the symposium. Students have the chance for personal conversations with the speakers during a light lunch. 

2018 presenters include Robert Ashcraft (Samsung), Jon Dorando (Bloomberg Research), Laszlo Seress (The D.E. Shaw Group), Jarod Younker (ExxonMobil), and Sarah Mostame (formerly Intel)

2018 Tentative Agenda

  • 10:30 AM: Opening remarks - Krishna Rajan, ScD; Erich Bloch Chair and Empire Innovation Professor, Department of Materials Design and Innovation, University at Buffalo (tentative)
  • 10:45 AM - 12:45 PM: Presentations
  • 12:45 PM - 1:30 PM: Light lunch for attendees
  • 1:30 PM - 3:00 PM Presentations
  • 3:00 PM - 4 PM Panel Discussion




Photo Gallery

Speaker presentations are now up!

Event Start Date: April 9, 2018