New this year — we have a fantastic lineup of hands-on tutorials! There are two parallel sessions of half-day and full-day tutorials on Tuesday, a full-day tutorial on Wednesday, and a half-day tutorial on Thursday afternoon. The tutorials are part of the ICSE program: if you are registered for ICSE week on Tue/Wed/Thur then you are free to attend that day’s tutorials.
These are hands-on tutorials. So, bring your laptop. Check out the prerequisites (some prerequisites ask you to download software or acquire accounts in advance). Come prepared to learn something new!
Running Applications on Kubernetes
Time: Tuesday, May 28, 9:00-12:30
Kubernetes (k8s) is an open-source system for automating deployment, scaling, and management of containerized applications. Kubernetes adds an abstraction layer to infrastructure whether that infrastructure is on-premises, in the cloud, or hybrid. Run and migrate workloads to the most appropriate deployment environment.
This workshop is an introduction to running applications on Kubernetes. The workshop includes a mix of lecture and hands-on labs. Participants will package a Java application into a Docker container, and then deploy and manage that application in Kubernetes. This workshop will utilize a number of services provided by Google Cloud Platform but the concepts are portable to any environment.
During this session, participants will:
- Package a simple, sample application into a Docker container
- Create a Kubernetes cluster
- Deploy the sample application in the Kubernetes cluster
- Scale the application, both up and down
- Upgrade the application
- View information and statistics about the application and k8s cluster
By the end of this workshop participants will be able to:
- Describe various features and benefits for Kubernetes
- Deploy and manage applications on Kubernetes
Nathen Harvey has been active in the DevOps community for almost a decade, putting the practices of DevOps to work and helping others learn and implement those practices. As a Cloud Developer Advocate at Google, he helps the community understand and apply DevOps and Site Reliability Engineering (SRE) principles and practices in the cloud to drive business outcomes. Prior to joining Google, Nathen led the Chef community whom he helped adopt continuous automation to build, deploy, and manage applications in fast, secure ways. He also has a background in running operations and infrastructure for a diverse range of web applications. Nathen is a co-host of the Food Fight Show, a podcast about Chef and DevOps.
Participants should bring a wifi-enabled laptop to the workshop. Participants will be given access to Google Cloud Platform for all of the labs. It’s best that participants of this workshop have some familiarity and comfort with the following:
- Basic web administration
- Basic system administration
- Working on the command line
Train a model with TensorFlow and run it in the browser
Time: Tuesday, May 28, 14:00-17:30
Josh Gordon and Robert Crowe
Please join us for an introductory TensorFlow workshop, taught at the beginner level. In this half-day session, we’ll introduce TensorFlow, then work through several simple exercises using tf.keras, TensorFlow’s latest and easiest to use high-level API. We will train a few models in Python, then work through each step needed to run a model in the browser (entirely client side, and interactively!) using TensorFlow.js. Next, we will serve our model using a REST API. The tutorial assumes basic prior machine learning experience, but is taught at the beginner level, and you can probably follow along even if you’re entirely new to the field. We’ll be available afterwards for discussions and deeper technical content if you have anything you’d like to chat about. There is no software to install, we’ll do everything in Colab.
Josh Gordon works on the TensorFlow team at Google, and teaches Deep Learning at Columbia University. He has over a decade of machine learning experience to share. You can find him on Twitter at @random_forests.
Robert Crowe is a recovering data scientist and TensorFlow addict. Robert has a passion for helping developers quickly learn what they need to be productive. He’s used TensorFlow since the very early days and is excited about how it’s evolving quickly to become even better than it already is. Before moving to data science Robert led software engineering teams for both large and small companies, always focusing on clean, elegant solutions to well-defined needs. In his spare time, Robert surfs, sails, and raises a family.
Scale-Out Data Science with R and Python
Time: Tuesday, May 28, 9:00-17:30
Tomas Singliar, Mario Inchiosa, John-Mark Agosta, Hang Zhang
Hands-on tutorial duration: 6 hours (2x 3-hour sessions)
Target audience: Intermediate level in knowledge and practice of machine learning, R, and Python
Python and R dominate the domain of data science software. However, when it comes to scalable analysis, or deployment of trained models into production, barriers still exist. Many data scientists are hindered by a limited suite of available functions to handle large datasets efficiently, and knowledge about the appropriate computing environments to scale R and Python scripts from desktop analysis to elastic and distributed cloud services. Another productivity limitation is the tedium of the experimentation loop in which the right preprocessing, model, and hyperparameters are found.
In this tutorial, we will demonstrate how to create scalable machine learning pipelines in R and Python with emphasis on scaling on Spark clusters. We will model the data science journey by first prototyping locally and then show how to move the data science process to the Cloud, to exploit larger compute resources and data colocation that various Spark implementations offer. In particular, the attendees will see how to build, persist, and consume machine learning models using distributed machine learning functions in Python and R. Armed with a distributed computing platform, we will show how Microsoft’s AutoML library can automate the search for the best model.
We will provide hands-on exercises drawing on recent examples from time series forecasting, Active Learning, and Reinforcement Learning. Code samples will be available in a public GitHub repository. Spark and AzureML Compute clusters will be the target distributed platforms; participants will do exercises on Data Science Virtual Machines using RStudio and Jupyter notebooks.
- Scaling up your data science process - issues and solutions
- What limits the scalability of your code in face of large data? What techniques can be used to overcome those limits? What libraries can I use in Python? In R?
- What limits your modeling productivity? How do I navigate the space of modeling choices - preprocessing sequences, models, hyperparameters?
Hands-on exercises and demonstrations:
- End to end scalable data process
- Data exploration, wrangling, visualization, modeling and deployment on single node Data Science Virtual Machines and Spark clusters
- Scalable analysis on single nodes: Analysis with data on disk, in-database, and in Spark
- Distributed model search and parameter optimization in python with AutoML.
- Deployment of ML models as web-services APIs with Azure ML python SDK, with parallel scoring on an elastic cluster.
Tomas Singliar is Principal Data Scientist in the Azure ML group in Microsoft AI Platform's AutoML team. He works on automated search for the best forecasting models. In this, his experience from architecting the Azure ML Python Package for Forecasting comes handy. Tomas's favorite hammer is probabilistic and Bayesian modeling, which he applies analytically and predictively to business data. His favorite anvil is cloud data stores, especially MPP SQL databases and data lakes. He studied machine learning at the University of Pittsburgh. Tomas published a dozen papers in and serves as a reviewer for several top tier AI conferences (AAAI, UAI, etc). He holds four patents in intent recognition through inverse reinforcement learning. Contact information: Tomas.Singliar@microsoft.com
Dr. Mario Inchiosa’s passion for data science and high-performance computing drives his work as Principal Software Engineer in Microsoft Cloud + AI, where he focuses on delivering advances in scalable advanced analytics, machine learning, and AI. Previously, Mario served as Revolution Analytics’ Chief Scientist and as Analytics Architect in IBM’s Big Data organization, where he worked on advanced analytics in Hadoop, Teradata, and R. Prior to that, Mario was US Chief Scientist in Netezza Labs, bringing advanced analytics and R integration to Netezza’s SQL-based data warehouse appliances. He also served as US Chief Science Officer at NuTech Solutions, a computer science consultancy specializing in simulation, optimization, and data mining, and Senior Scientist at BiosGroup, a complexity science spin-off of the Santa Fe Institute. Mario holds Bachelor’s, Master’s, and PhD degrees in Physics from Harvard University. He has been awarded four patents and has published over 30 research papers, earning Publication of the Year and Open Literature Publication Excellence awards. Contact information: firstname.lastname@example.org.
John Mark Agosta leads a team that is expanding the machine learning and artificial intelligence capabilities of Microsoft Azure. He recently joined Microsoft, which if he were smarter, he should have done earlier in his career – a career that involved working with startups and labs in the Bay Area, in such areas as “The Connected Car 2025” at Toyota ITC, sales opportunity scoring at Inside Sales, malware detection at Intel, and automated planning at SRI. At Intel Labs, he was awarded a Santa Fe Institute Business Fellowship in 2007. He has over 30 peer-reviewed publications and 6 accepted patents. His dedication to probability and its applications is shown by his participation in the annual Uncertainty in AI conference since its inception in 1985. When feeling low he recharges his spirits by singing Russian music with Slavyanka, the Bay Area’s Slavic music chorus. Contact information: email@example.com.
Dr. Hang Zhang is a Principal Data & Applied Scientist in the Commercial Software Engineering team at Microsoft. He is also an affiliated professor at the University of Washington. His technical domains of interests include big data IoT, scalable data science and machine learning frameworks, computer vision, etc. Before joining Microsoft in 2014, Hang had stints at Walmart Labs and Opera Solutions leading a team building internal tools for search analytics and business intelligence and focusing on machine learning. Hang has a Ph.D. in Industrial and Systems Engineering and an M.S. in Statistics from Rutgers, The State University of New Jersey. He is a Senior Member of IEEE. Contact information: firstname.lastname@example.org.
Participants should come to the sessions with access to an Azure subscription. You can use Azure’s free tier.
R for Software Engineering Research
Time: Wednesday, May 29, 11:00-17:30
This one-day tutorial will introduce participants to the statistical programming language R, and to a set of tools known as the “tidyverse” that can be used to load, clean, explore, analyze, and visualize complex data. Examples will be drawn from software engineering datasets; participants must be comfortable programming, but do not need previous exposure to R.
Dr. Greg Wilson has worked for 35 years in both industry and academia, and is the author or editor of several books on computing and two for children. He is best known as the co-founder of Software Carpentry, a non-profit organization that teaches basic computing skills to researchers, and is now part of the education team at RStudio.
Participants should have RStudio installed on their laptop.
Time: Thursday, May 30, 14:00-17:30
Università della Svizzera italiana
In this hands-on tutorial, Michele Lanza, a Professor of Software Engineering at the Università della Svizzera italiana (USI) in Lugano, Switzerland and a frequent invited speaker, will explore numerous best practices in preparing and delivering presentations. Themes include: How to structure your presentation. Effective design of presentation material. How to engage your audience. Body language. The importance of rehearsals. Participants might be invited to prepare short research presentations to be studied and critiqued during the tutorial.
Michele Lanza is professor at the faculty of informatics of USI, Università della Svizzera italiana, in Lugano, Switzerland. He co-founded said faculty in 2004, and is leading since then the REVEAL research group http://reveal.si.usi.ch, working in the areas of software engineering, evolution, visualization, and analytics. Since 2016 REVEAL has two heads, the other one being Prof. Gabriele Bavota. In 2017 Michele founded the Software Institute (SI, https://si.usi.ch), which he directs against all odds, together with a wild and keen bunch of software fetishists. Since 2016 he’s also pro-rector of USI. In essence, he doesn’t own a single hat, but he likes to wear many of them. Makes him look fancy.
His doctoral dissertation, completed in 2003 at the University of Bern under the guidance of Prof. Oscar Nierstrasz and Dr. Stéphane Ducasse (yep, Michele used to be lucky), received the Ernst Denert award for best thesis in software engineering of 2003. He also received the Credit Suisse Award for best teaching in 2007 and 2009. He co-authored roughly 200 peer-reviewed publications, and the book “Object-Oriented Metrics in Practice”. In short, a medium-rare h-index and some citation potatoes as side dish. Rumour has it that he’s pretty darn good at giving presentations, but who knows?