Data mining is one of those new “technology terms” that many of us have heard of, but have no clue what it means (Hint: that was me a week ago). First, let’s start out with an understanding of what data mining is, and then we’ll look at why it is important in the field of history. Following is a simple definition for data mining:
Data mining: the practice of analyzing large databases in order to generate new information.
It could be argued that we have “mined” for data as long as we have desired to learn about the past, Looking through historical records for “data” is nothing new. Historians have been doing it for eons. However, today’s data mining is much different. With vast amounts of the historical record now being digitized, we can use new technologies, algorithms, computer programming, and even artificial intelligence to help us analyze the data in ways that were not previously available to us. Data mining can literally change the way we look at historical events.
One example of how these new data mining techniques are being put into practice is seen in the work that Harvard University students Erez Lieberman Aiden and Jean-Baptiste Michel are doing. Michel had this to say:
There are many usages of this data, the bottom line is that the historical data is being digitized.
Taking advantage of the fact that google has now digitized millions of books, Aiden and Michel have applied data mining techniques to this massive amount of information to help us understand history in a new way. The amount of times that certain words appear can show the interests and concerns of society during a certain time period. Also, how often words or phrases come up in the data can show cultural shifts and changing attitudes in society. Aiden and Michel refer to this as “culturomics.” Other concepts like career popularity, censorship, and others can be ascertained using these data mining methods. Following is a Ted Talk put on by these brilliant Harvard students.
Another great example of how data mining can change the way we look at history is the work done by Zoe Alker. Alker and her team have looked through thousands of convict records kept by British judicial authorities from 1793 to 1925. From these records she has mined data regarding British and Australian convicts, more specifically about the tattoos that they had on their bodies. After formulating rather complex research methodologies and criteria, she then applied modern technologies such as data visualization to help us understand a previously understood aspect of history in a new way. So what possible contribution could learning about convict tattoos offer, you ask? Apparently, quite alot. From data mining this information about Victorian era tattoos on convicts, we can learn a lot about the culture of that time, Alker has shown that tattoos were not just used by the criminal class in society, as had been previously understood. Rather, tattoos were used by a wide range of individuals, from laborers to businessmen, to express a wide range of sentiments. Tattoos were used for personal expression, pretty much the same reason they are used today. Through data mining methods, Alker has changed the way we understand this small slice of history. Alker points out that data mining isn’t always easy, especially when the historical record isn’t crystal clear on the subject. When asked about research methodologies and the challenges faced in mining data, she offered this advice:
Its about being able to use these different techniques, to knowing your data, and really understanding your data, and getting to know and test and refine that data, in order to be able to come up with really surprising conclusions.
Following is a video lecture by Zoe Alker talking about her project and her data mining techniques (The video actually starts at the 8 minute 42 second mark)
There are some great resources out there to learn more about data mining and how to use it in your research. The following website by the University of Queensland contains yet more examples of data mining.
Today’s data mining is a lot more than sitting down at a library looking through a dusty box of archival material. It is taking massive amounts of information, quite often information that has already been digitized, and applying modern technologies and historical analysis to understand history in a new way. It is important to note that much of the historical record, or “data” isn’t new. The millions of books that Aiden and Michel used to give us new historical insights were already there. What is new are the methods in which we are mining that data. And these new methods are opening our eyes to new and exciting ways of understanding history.