Write a Blog >>
MSR 2019
Sun 26 - Mon 27 May 2019 Montreal, QC, Canada
co-located with ICSE 2019
Mon 27 May 2019 15:00 - 15:06 at Centre-Ville - Session X: Building on Data Chair(s): Cor-Paul Bezemer

The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks. By analyzing the metadata and code, we have included the projects in our dataset which use a diverse set of machine learning libraries and managed by a variety of users and organizations. The dataset is made publicly available through Boa infrastructure both as a collection of raw projects as well as in a processed form that could be used for performing large scale analysis using Boa language. We also present two initial applications to demonstrate the potential of the dataset that could be leveraged by the community.

Conference Day
Mon 27 May

Displayed time zone: Eastern Time (US & Canada) change

14:45 - 15:30
Session X: Building on DataMSR 2019 Data Showcase / MSR 2019 Technical Papers at Centre-Ville
Chair(s): Cor-Paul BezemerUniversity of Alberta, Canada
14:45
15m
Full-paper
Standing on Shoulders or Feet? The Usage of the MSR Data Papers
MSR 2019 Technical Papers
Zoe KottiAthens University of Economics and Business, Diomidis SpinellisAthens University of Economics and Business
Pre-print
15:00
6m
Talk
Boa Meets Python: A Boa Dataset of Data Science Software in Python Language
MSR 2019 Data Showcase
Sumon BiswasIowa State University, Md Johirul IslamIowa State University, Yijia Huang, Hridesh RajanIowa State University
Pre-print Media Attached
15:06
6m
Talk
A Benchmark of Data Loss Bugs for Android Apps
MSR 2019 Data Showcase
Oliviero Riganelli, Marco Mobilio, Daniela MicucciUniversity of Milano-Bicocca, Italy, Leonardo MarianiUniversity of Milano Bicocca
15:12
6m
Talk
RapidRelease - A Dataset of Projects and Issues on GitHub with Rapid Release
MSR 2019 Data Showcase
Saket JoshiIndian Institute of Technology Tirupati, Sridhar ChimalakondaIndian Institute of Technology Tirupati
15:18
6m
Short-paper
A Tool to Analyze Packages in Software Containers
MSR 2019 Technical Papers
Ahmed ZeroualiUMONS, Valerio CosentinoBitergia, Jesus M. Gonzalez-BarahonaUniversidad Rey Juan Carlos, Gregorio RoblesUniversidad Rey Juan Carlos, Tom MensUniversity of Mons
Pre-print
15:24
6m
Talk
An Empirical History of Permission Requests and Mistakes in Open Source Android Apps
MSR 2019 Data Showcase
Gian Luca Scoccia, Anthony PerumaRochester Institute of Technology, Virginia Pujols, Ben Christians, Daniel KrutzRochester Institute of Technology