DataFest 2021 @ EDI

University of Edinburgh & Heriot-Watt

DataFest @ EDI logo

ASA DataFest 2021 @ Edinburgh was virtually over the March 26 - 28, 2021 weekend with participation from students from The University of Edinburgh and Heriot-Watt University.

Find out more about how the event went and the winners here.

What is DataFest?

ASA DataFestTM is a data analysis competition where teams of up to five students attack a large, complex, and surprise dataset over a weekend. Your job is to represent your school by finding and communicating insights into these data. The teams that impress the judges will win prizes as well as glory for their school. Everyone will have a great experience, lots of food, and fun!

ASA DataFestTM is also a great opportunity to gain experience that employers are looking for. Having worked on a data analysis problem at this scale will certainly help make you a good candidate for any position that involves analysis and critical thinking, and it will provide a concrete example to demonstrate your experience during interviews.

ASA DataFestTM at University of Edinburgh is organized by the School of Mathematics.


While ASA DataFestTM is a competition, the main goal of the event is to promote collaboration. Here are some testimonials from past participants:

"I highly encourage all students to take part in this event. Not only is it incredibly fun, but it also gives you an opportunity to showcase/develop your analytical, teamwork, and time management skills, all of which are highly valuable to a potential employer."

"I think everyone should sign up for DataFest! No matter what your background is, you will certainly provide valuable insight that other teammates might not have! Beyond that, it is really fun to muck around in realistic data and see what insights you can find."

"The event was a great opportunity to challenge my coding skills and work with people I had not met before. At the end of the weekend, I had made new friends and learnt a lot. I strongly recommend it."

Past DataFests

2021 - Rocky Mountain Poison and Drug Safety Center

Goal: For this competition the Rocky Mountain Poison and Drug Safety Center was interested in discovering and identifying patterns of drug use, with particular attention paid to identifying misuse. These could include patterns that might describe demographic profiles within a given category of drug or combinations of drugs that frequently appear together. One goal they were keen to achieve with the data was to predict future drug misuse cases. The data sets came from participants who responded to an on-line request and were paid to participate. Click here to read more about the submissions from the winning teams at ASA DataFest 2021 @ EDI.

2020 - COVID-19

Goal: For this competition, we challenged participants to explore the societal impacts of the COVID-19 pandemic other than its direct health outcomes. Participants were allowed to explore everything from the effects on pollution levels, transportation levels, or working from home. They could investigate changes in the number of people posting on TikTok with their families or do an analysis on online education. We left the focus up to them and urged them to be thoughtful and creative as they analyzed data and communicated their insights about some of pandemic's impacts on society. Click here to read more about the submissions from the winning teams at ASA DataFest 2020 @ EDI.

2019 - Canadian National Women's Rugby Team

Goal: How do we quantify the role of fatigue and workload in a team’s performance in Rugby 7s? How reliable are the subjective wellness Fata? Should the quality of the opponent or the outcome of the game be considered when examining fatigue during a game? Can widely used measurements of training load and fatigue be improved? How reliable are GPS data in quantifying fatigue?

2018 - Indeed

Goal: What advice would you give a new high school about what major to choose in college? How does Indeed's data compare to official government data on the labor market? Can it be used to provide good economic indicators?

2017 - Expedia

Goal: How do visitors' searches relate to the choices of hotels booked or not booked? What role do external factors play in hotel choice?

Expedia provided DataFesters with data from search results from millions of visitors around the world who were interested in traveling to destinations all over the world. The data were in two files, one of which included data collected on search results from visitors' sessions, and another which contained detailed information about the destinations that visitors searched for.

2016 - Ticketmaster

Goal: How can site visits be converted to ticket sales, and how can TicketMaster identify "true fans" of an artist or band?

Data consisted of three sets. One included events from the last 12 months that tracked customer travel through the website. Another provided information about advertising campaigns on Google, and the third included data on the events themselves.

2015 -

Goal: Detect insights into the process of car shopping that can help make the process easier for customers.

Data consist of visitor 'pathways' through a website that helps customers configure car features and shop for cars. Five data files were linked by a customer key, and including data about the customer, about his or her visits to the webpage, and, when applicable, about the car purchased and the dealership where the car was purchased.

2014 - GridPoint

Goal: Help understand how customers can best save money and energy.

Data consisted of a random sample of customers, with five-minute aggregates over a year of energy consumption that was then aggregated across important features of the commercial properties, as well as supporting climate and location data.

2013 - eHarmony

Goal: Help understand what qualities people look for in prospective dates.

The DataFest students worked with a large sample of prospective matches. For each customer, data were provided on his or her preferences, as well as four matches, their preferences, and information about whether parties contacted one another.

2012 -

Goal: Help understand what motivates people to lend money to developing-nation entrepreneurs and what factors are associated with paying these loans.

Several data sets were provided, including characteristics of lenders and borrowers and loan pay-back data.

2011 - Los Angeles Police Department

Goal: Make a data-based policy proposal to reduce crime

Data consisted of arrest records for every arrest in Los Angeles from 2005-2010, including time, location, and weapons involved.