Return To The Wire

Crunching Numbers

BOW DataFest Teaches Statistical Analytic Methods, Life Lessons 

Spending a whole weekend manipulating and analyzing a large set of corporate data may not be everyone’s idea of an exciting way to use your free time. But for a group of 21 Babson, Olin and Wellesley students, the weekend of April 6 – 8 was exactly that.

They were taking part in the BOW DataFest, a kind of hackathon for statistics that brought together teams from the three institutions to analyze a real, large-scale data set provided by a company. The American Statistical Association sponsors the event. 

The challenge: see who can do the best job analyzing the welter of data and coming up with conclusions based on their observations.

The event commenced Friday evening, April 6, continued all day that Saturday, and concluded Sunday around noon, followed by team presentations. Five teams — one from Olin, two from Wellesley, one from Babson and a mixed Wellesley/Babson team — completed the challenge.

Students worked in teams of three to five in Babson’s Blank Center, where they could be seen hunched around tables, tapping at laptops and discussing their calculations, as industry and faculty mentors circulated among them, offering help.

The data set, a collection of job listings provided by the job site indeed.com, included such information as job category, along with the educational background and work experience required for the position.

 “The thing about the data is, it’s really massive,” says Ed Soares, a visiting professor of data science at Olin who was on the organizing committee for the event. “The students look at it in a holistic sense, maybe see what variables were measured and how they might relate to each other. They ask whether there are some interesting questions that that data might be able to answer, and then apply some statistical or analytic methods to the data in order to answer those questions.”

Olin sophomore Kyle Combes tackled the challenge with classmates Sophie Schaffer and William Derksen “to have fun and explore something new.”

The team spent Friday evening downloading the data set and taking a first look, eventually deciding they would create a model for estimating the take home salary of applicants in the various job categories in different regions of the country, based on their qualifications.

The team spent a lot of time trying to come up with additional data sets that would allow them to estimate the cost of living in different regions of the country. This proved a difficult challenge.

Applying analytical methods such as linear regression, the team succeeded in coming up with a model that fit the data well, but wasn’t great at estimating salaries. In the end, they had a model that output an estimated salary with an average error of $18,000, which Combes describes as “not horrible, but not good enough.”

“I would encourage future teams not necessarily to spend all their time on one project or one particular idea if it doesn’t seem to be panning out,” he advises.

Nevertheless, Combes still feels the exercise was worthwhile.

“It gives you a sense of confidence that you can do something interesting and innovative with data — or at least start working toward something that’s new and insightful— even if you don’t have a lot of experience,” says Combes. “It doesn’t take a Ph.D. to uncover something new and insightful that could be of use.”

For Soares, the exercise was useful because it teaches students an important lesson useful in both career and life. Unlike in mathematics, which tends to move in a straight line from problem to solution, in statistics student must learn to grapple with a great deal of uncertainty.

“I’d like to get every student to understand that life is really about ambiguity, not just in statistics, but in life in general,” notes Soares. “As soon as you understand this, your world is going to open up, because then you realize how the world actually works.”

At an awards ceremony at Babson’s Park Manor West Monday evening of the weekend, a panel of judges handed out awards to team "bowchickadatawow" (Wellesley) for Best Visualization and Best Data Analysis combined; to team "t-best" (Wellesley) for Best Use of External Data; and to team "Woodland" (Babson) as the “Most Promising” team. The Olin team, "Shenanigans,” won Honorable Mention.