-
Notifications
You must be signed in to change notification settings - Fork 8
Data Driven Exploration of the R Community
The R language has gained prominence in recent times for data science and statistical analysis. The adoption of R across industries and educational spaces has been widely reported as facilitated by the the welcoming nature of the R community.
One of the most important metrics to monitor the growth of the R community is the number of R User Groups (RUGs) spread across the world. Most of these groups use meetup.com to arrange their events. Meetup.com can keep track of the groups’ number of members and meetings over time and also provides an API to interact with its data.
The R Consortium has identified the importance of RUGs as an important way to grow and maintain the R community, and has setup a RUG support program to fund these groups at several levels. They have also identified the need to diversify the user-base of the R community to cut across under-represented groups and have funded the R-Ladies project, FORWARDS project and the R Community Diversity Initiative-Working Group (RCDI-WG) to further advance the adoption of R across several spheres. Currently, the RUGS program and R-Ladies project are two out of three top-level (high priority) projects supported by the R Consortium.
With a whooping 185 projects over 11 years, the Google Summer of Code program (GSoC) has been a great platform for getting students attracted to the R and open source communities, and also getting a lot of work done. Providing an infrastructure to track these projects over time, discovering and recognizing the efforts of mentors in a data-driven manner is usually important for future program management and expansion.
There appears to be a need to be able to search for R user / R-ladies groups by country and city, find a URL linking to their page online, etc.
There also appears to be a need to always see those regions where R user / R-ladies groups remain under-represented now and in the future.
To know the number of R users on meetup.com and track this over time and possibly by region.
There is no infrastructure like a dashboard to track yearly count of projects, mentors, and type (work-product) of projects. Projects have also not been classified to the public based on work-product (e.g. new package, infrastructure, documentation, etc). Thus there is a need to discover R-GSoC projects in a data-driven manner and recognize those mentors that have spent the most time mentoring students via GSoC and to keep track of several numbers around students, projects, etc.
It is now quite important to keep track of these groups and monitor their growth by using a data-driven approach to determine their geographical diversity around the world over time. A region may be under-represented now, and may not remain that way in the future. Thus, having a way to monitor this is quite significant.
It is important to know those cities where there are more R user groups. This could help companies or individuals targeting areas where they can find many R users or locations where R flourishes the most.
When R-GSoC projects, students and mentors are discovered and disseminated effectively, it has the potential to attract more participants to the program over time and making it more sustainable. When more participants (especially mentors) are attracted, it relieves over-labored mentors, making them more willing to participate in future.
Having all these summaries in a dashboard provides an aggregated but broad overview.
Jumping Rivers maintains a list of RUGs and R-Ladies groups on GitHub. But this list is not complete.
The RUGS program provides a list of groups that have been funded by the R Consortium, but they are few compared to the actual size of R user groups.
The R-Ladies project provides a dashboard showing summaries and stats around their chapters. This is a Shiny app that depends on Shiny Server (with its associated costs that could be eliminated by hosting the static site on GitHub for free) and it lacks the search functionality that this project will offer.
Hans W Borchers provides a list of R-GSoC Projects from 2008-2014 here. R-GSoC Org admins provide another list in CSV format here.
The project primarily proposes to use the meetupr R package developed by the R-Ladies org to query the API of meetup.com and retrieve data for R user / R-Ladies groups and display summaries via a static dashboard.
Highlights:
- Create R script that uses `meetupr` package to retrieve data for all R user groups on meetup.com
- Create R script that uses `meetupr` package to retrieve data for only R-Ladies groups on meetup.com
- With code, arrange these data in a particular format and save them in CSV files
- Create two static dashboards using HTML / CSS / JavaScript that can be hosted via GitHub pages to show summaries for these data by writing JS scripts to read and analyze the CSV files. Summaries include:
- total number of user groups
- total number of group members for all R user groups and R-Ladies groups,
- number of cities, countries and/or regions with R user group presence, etc
- Dashboards will include DataTables of user group data which can be searched and sorted interactively.
- Dashboards are proposed to contain interactive charts and map of group counts across countries.
- Leaderboards to highlight most active groups, recognize most active organizers, etc.
- Work Product classification for all past 185 GSoC projects + this year’s projects
- Mentor names should be appropriately written to be consistent (some are not full names or differ along the list or are just user names)
- Show count of unique mentors till date.
- Show count of returning mentors
- Show number of students returning as mentors
- Chart of most valuable mentors (based on number of projects mentored till date)
- Chart of yearly number of projects
- Chart of number of mentors per year
- Chart of Work-Product Distribution
- Number of projects funded till date and Number of students participated till date
- Deploy dashboard pages via GitHub Pages
- Help new users easily locate user groups or links on how to get help to start one.
- Displaying count of groups over an interactive world map could help in the identification of areas where groups are under-represented.
- Could help R Consortium’s RCDI-WG, R-Ladies, and RUGS projects monitor trends around user-group expansion worldwide.
- Could provide a way for the R-Foundation to estimate number of R users worldwide and plan diversity scholarships or other similar packages for useR! conferences.
- The output of this project could help provide better insights on how to develop an infrastructure to automatically track R events on a calendar (a need pointed out by the R Consortium here.)
- This work will add further validation for the `meetupr` package and possibly contribute to its vignette.
- A data-driven dashboard for R-GSoC projects could serve as a potential tool for the R Foundation, community and org admins, to understand trends, gain insights and potentially improve existing processes to attract more students and mentors
- Claudia Vitolo <[email protected]> is a member of R-Ladies Global Leadership team. She is a voting member of R Consortium’s Infrastructure Steering Committee (ISC), co-author of the `meetupr` package and has authored several R packages.
- Rick Pack <[email protected]> is a data scientist at LabCorp, a contributor and strong user of the `meetupr` package. He is an Analytics>Forward event organizer and R package author.
- Hans W. Borchers <[email protected]> is the maintainer of the “Optimization and Mathematical Programming” CRAN Task View. He has severally mentored R-GSoC projects and has a list of R-GSoC projects here.
- Show proof of work using the meetupr package to query the Meetup.com API
- Show proof of work using d3.js and pure JavaScript to read CSV data, group it using the nest() function, and log data on the console.
- Show proof of work using JavaScript to display CSV data on a DataTable
- Show proof of work using echarts.js to visualize JSON or JavaScript array data.
- Display knowledge of using any open-source HTML Dashboard Bootstrap template.