Rob DeLine, a Principal Researcher at Microsoft Research, has spent the last thirty years designing programming environments for a variety of audiences: end users making 3D environments (Alice); software architects composing systems (Unicon); professional programmers exploring unfamiliar code (Code Thumbnails, Code Canvas, Debugger Canvas); and, most recently, data scientists analyzing streaming data (Tempe). He is a strong advocate of user-centered design and founded a research group applying that approach to software development tools. This approach aims for a virtuous cycle: conducting empirical studies to understand software development practices; inventing technologies that aim to improve those practices; and then deploying these technologies to test whether they actually do.
Title: We won! So what?
To quote our research community’s succinct mission statement: “The Mining Software Repositories (MSR) field analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.” In the earliest days of this conference, this mission was a novel possibility that the flourishing Open Source movement created. These days, however, the practice of turning repository data into actionable insights and deployed models has become bog standard. So, congratulations to the MSR community for leading the way! But now what? MSR finds itself caught in a heated competition among industry researchers and data scientists to find novel ways to exploit data and apply models. Given the resources and energy that industry now invests in data science and machine learning, MSR cannot hope to succeed by working on the same types of problems, using the same techniques. It’s time to pivot. Luckily there are hard open problems for which industry is hungry for results: How can we continue to get insights and build models while upholding privacy laws (GDPR) and user privacy preferences? How can we make trained models understandable to all relevant stakeholders? How can we ensure that our insights and models are not harmed by human biases like sexism, racism, political manipulation, etc.? The first half of this talk will describe current industry practice in data science and machine learning, based on recent studies. In the second half, I’ll describe some difficult new problems, to prod energetic discussion about the future direction of MSR.