These days, there is certainly no shortage of published materials about data science. In addition to the somehow vague Wikipedia definition and the manifesto by Mike Loukides, a respectable amount of words on the subject have already been written. Yet, as Tom Heath nicely puts it, most articles simply “bury a concise definition of the term among descriptions of the characteristics, methods or responsibilities of data scientists”. More often than not, the discussion on what data science represents is obscured by predictions of the importance of the discipline in future business endeavors or the profitability of a career in the field. It is not uncommon to get the feeling that data science, much like big data, is something everyone talks about but not very many really master…
“Data science isn’t just about the existence of data, or making guesses about what that data might mean; it’s about testing hypotheses and making sure that the conclusions you’re drawing from the data are valid”
This quote by Mike Loukides seems to be a pretty accurate, yet general, description of the scientific method. However, the nugget that distinguishes data science from other scientific disciplines is the multidisciplinary set of skills that a practitioner likely needs to master, which is spread across traditionally non-overlapping fields. As the diagram that illustrates this post summarizes, some of the knowledge areas required for a solid background as data scientist are
- Mathematics and Statistics
- Data Engineering
- Computing and Programming Skills
- Domain Expertise
- Hacker Mindset
- Communication Skills
Here at The Data Science Lab the idea is to approach data science from the perspective of “doing science with data” and to hone the above skills while doing so. While that is arguably nothing strictly new, for many scientific disciplines rely on the analysis of some kind of data and on continuous learning, the focus here lies on extracting novel insights from diverse (open) data sources. The commonality will be on the usage of data and the methods employed, and not so much on the data per se. As for the “science” part, the standard scientific method will be applied in order to extract the most of the data (but without torturing it). Creativity is key, and that is precisely what makes data science so fascinating, as often the story that hides within the data is not immediately apparent. By practising and dealing with real datasets in the controlled environment of a simple laptop we can learn a great deal and become a better “data detective”. And, perhaps more importantly, we will have tons of fun along the way!
I am a scientist, and I like data
Logic seems to dictate that this makes me a data scientist! As such, I am interested in using the scientific method to observe and measure a variety of phenomena in order to formulate and test hypotheses about nature.
Every scientist needs tools, and possibly a laboratory or controlled environment to pursue experiments. At The Data Science Lab we will use the tools of open data and open source software, as well as a laptop, to perform what I would call “table-top experiments with data”.
Let’s go exploring! We never know what we might find out.