Doing Science With Data

These days, there is certainly no shortage of published materials about data science. In addition to the somehow vague Wikipedia definition and the manifesto by Mike Loukides, a respectable amount of words on the subject have already been written. Yet, as Tom Heath nicely puts it, most articles simply “bury a concise definition of the term among descriptions of the characteristics, methods or responsibilities of data scientists”. More often than not, the discussion on what data science represents is obscured by predictions of the importance of the discipline in future business endeavors or the profitability of a career in the field. It is not uncommon to get the feeling that data science, much like big data, is something everyone talks about but not very many really master

“Data science isn’t just about the existence of data, or making guesses about what that data might mean; it’s about testing hypotheses and making sure that the conclusions you’re drawing from the data are valid”

DataScienceDisciplinesThis quote by Mike Loukides seems to be a pretty accurate, yet general, description of the scientific method.  However, the nugget that distinguishes data science from other scientific disciplines is the multidisciplinary set of skills that a practitioner likely needs to master, which is spread across traditionally non-overlapping fields. As the diagram that illustrates this post summarizes, some of the knowledge areas required for a solid background as data scientist are

  • Mathematics and Statistics
  • Data Engineering
  • Computing and Programming Skills
  • Domain Expertise
  • Hacker Mindset
  • Communication Skills

Here at The Data Science Lab the idea is to approach data science from the perspective of “doing science with data” and to hone the above skills while doing so. While that is arguably nothing strictly new, for many scientific disciplines rely on the analysis of some kind of data and on continuous learning, the focus here lies on extracting novel insights from diverse (open) data sources. The commonality will be on the usage of data and the methods employed, and not so much on the data per se. As for the “science” part, the standard scientific method will be applied in order to extract the most of the data (but without torturing it). Creativity is key, and that is precisely what makes data science so fascinating, as often the story that hides within the data is not immediately apparent. By practising and dealing with real datasets in the controlled environment of a simple laptop we can learn a great deal and become a better “data detective”. And, perhaps more importantly, we will have tons of fun along the way!

One comment

Post a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s