DataFest 2021
Thirty-six teams participated in the ASA DataFest @ EDI 2021 between 26-28 March 2021.
ASA DataFest is a data analysis competition where teams of up to five students attack a large and complex surprise dataset over a weekend to find and communicate insights into the data. The teams that impress the judges win prizes, and the event is a great opportunity for them to gain some data analyses work experience.
Undergraduate students from 10 different schools across the University of Edinburgh and Heriot-Watt University participated in this event with Academic staff, postgraduate students, and data scientists from industry attending as consultants to guide participants during the event.
For this competition the Rocky Mountain Poison and Drug Safety Center was interested in discovering and identifying patterns of drug use, with particular attention paid to identifying misuse. These could include patterns that might describe demographic profiles within a given category of drug or combinations of drugs that frequently appear together. One goal they were keen to achieve with the data was to predict future drug misuse cases. The data sets came from participants who responded to an on-line request and were paid to participate.
For this challenge the participants were asked to produce two products:
- a six minute video detailing their primary findings; and
- a one page summary which included the primary questions that were investigated, or the general goal of the project, the methods used, and a quick description of the findings.
Over the weekend participants worked together, and with consultants, via Gather Town. Multiple virtual get-togethers were held on Gather Town’s virtual rooftop where everybody involved could meet and get to know each other! The award ceremony was held over Zoom on the second of April.
Judging
Our judges for this year’s DataFest @ EDI were
- Ksenia Aleksankina - Data Scientist, Mirador Analytics
- Nicole Augustin – Faculty, University of Edinburgh
- Philip Darke – Actuary and PhD Researcher, Mercer and Newcastle University
- Joanna Faulds – Senior Data Scientist, BBC
- Ruth King – Faculty, University of Edinburgh
- John MacInnes - Emeritus Professor of Sociology and Statistics at The University of Edinburgh
The judges first reviewed the submissions from participants independently and scored them, and then deliberated on a Zoom call to make final decisions on winners.
The judges remarked that “it’s so hard to pick a winner, everybody did so well, but it was honestly really true that every team did something great. That could have been a slick presentation, novel visualisation or just some general comments that showed a really deep understanding of the data or the problem. So, you should all be really proud about what you’ve achieved over just a weekend, it’s quite incredible."
Awards
🏆 Best insight: The Bayes-sic Team
Cannabis Usage and Prediction
Zeno Kujawa, Greig Rowe and Lee Suddaby
📹 Video presentation ✨Shiny app
The aim of this project was to investigate the factors which can be correlated to an individual’s cannabis use. By doing so we wish to find which variables were the most useful to help with the development of a questionnaire to predict cannabis misuse in the USA. This analysis was based on the 2019 survey results.
The judges were very impressed with the level of the statistical analysis the team presented. They liked how the team picked a single question to answer and went into great depth with it. They also appreciated the Shiny app the team built to summarize their results. They also noted how the team considered ethics of data collection. Very impressive work, well done!
🏆 Best visualisation: Hippopotamus Testing
Demographics and Geography of drug use in the UK
Michael Renfrew, Michał Kobiela, Kaiya Raby and Stanislaw Szcześniak
The team used The Survey of Non-Medical Use of Prescription Drugs (NMURx) Program in order to analyse trends in drug use in the UK. Our analysis was performed using the R language. We mostly focused on geographical and demographic tendencies.
The judges felt that the geographic focus was successful and the team’s presentation included some brilliant visualisations, for example the postcode heat maps and comparisons across age and by sex. Weightings were allowed for in confidence intervals, and the team presented possible explanations for some of the patterns identified but also highlighted important limitations e.g. sample sizes across postcode groups. The presentation was professional and engaging. The judges emphasized that that the team should be proud of what they achieved over the weekend. They also mentioned that old people having more drugs lying around was a new insight for them, and they’re curious what’s going on in the Highlands?!
🏆 Best use of outside data: TheThreeMusketeers
Recreational drug predilections
Benjamin Gardner and Matthew Reidy
The project goal was to determine the recreational drug predilections by demographic in Scotland and the rest of the UK (RUK), to compare these, and explore the potential reasons underlying Scotland’s high drug deaths. A higher consumption of MDMA appears to be correlated with the higher Scottish death rate, and this should be researched further. Furthermore, there are a number of significant difference in the preferences.
The judges were very impressed with the focus the team picked and how the team brought in external data on Scottish drug deaths. This is an topic of huge importance and social relevance and they really liked that the team were able to tie this in to their project. They also liked the team’s heatmaps comparing Scotland to the rest of the UK.
🏆 Judge’s pick - FlyingPenguins
Drug usage and mental health disorders
Arnav Bhargava, Purvi Harwani, Arjun Nanning Ramamurthy, Laura O’Sullivan, Pablo Ortuno Floria
📹 Video presentation ✨ Shiny app
The team explored the relationship between drug usage and mental health disorders. To do this, they created a Shiny app. They focused primarily on 3 facets: Non-Illicit drug use, Illicit drug use, and demographics relating these two together. They added interactivity to our app by allowing users to apply filters such as gender, mental health disorders, substance abuse, and illicit drug use. They looked at the number of users for each non-illicit drug and what percentage of those cases were used for non-medical use and also relevant statistics relating to these findings.
The judges were very impressed with how much functionality the team packed into the team’s dashboard. The team’s presentation was professional and highlighted important limitations of the sampling approach, sample sizes, and the complexities of mental health, which they also thought made their project strong!
🏆 Judge’s pick - Team Schoffee
Understanding drug misuse among healthcare workers in the United Kingdom
Syaqilah Farihah Binti Akmal Hisham, Serena Inez Binti Rafizal, Siti Rohmah Binti Satitan, Nurul Binti Yazid, Nicholas Goguen-Camponi
📹 Video
How do healthcare workers take care of themselves in terms of drug misuse? Team Schoffee defines the misuse of drugs as taking drugs without a doctor’s prescription or for a reason not recommended by a doctor. This does not imply any dependency on the drug. As is shown by the data, a large proportion of healthcare workers, 66%, have misused drugs at least once. The team found this result surprising because we would expect that as healthcare professionals, they would understand the consequences of misusing drugs.
The judges loved how professional the team’s slides and presentation were. They were impressed with the particular focus the team picked on drug use among medical professionals as well as how they communicated their modeling results via a confusion matrix. Very well done!
🏅 Honourable mention - Best insight: JGGL
Misuse and severity
Gabrielle Gaudeau, Jai Karayi, Gareth Lamb, Luca Terry
📹 Video
Team JGGL focused on finding a way of measuring the breadth of drugs a person has misused with a weighting for the severity of the drug in question, examining how representative the data set is to the UK as a whole, and with this measurement, see how this score changes for different demographics.
The judges commended the team on the specific focus they picked about the maximum possible prison sentence as well as how they compared the data provided with other data sources to get a sense of representativeness. They were also impressed with the team’s clear presentation.
Best team name: Abraca-data
Dave Diaper, Adam Henderson, Jonah Ramponi, Lyndon Scott Humphris, Robin Weersma