What exactly is data mining, as a skill, anyway?
It’s kind of an odd term. "Data mining" kind of sounds like professionals mine for data in the same way gold miners mine for gold.
This term is commonly misunderstood by many professionals, says Max Galka, CEO of Revaluate, a company that specializes in housing and lifestyle data (currently in beta).
“When a data scientist speaks of data mining, he or she is referring to a specific type of machine learning, which is used to find patterns in an already existing set of data," Galka says.
"Essentially, data mining is digging through the data to find patterns."
First, Why is Data Mining in Such High Demand?
The information age has broken boundaries of the limit of information that one can acquire. From healthcare to automotive industries, businesses are constantly collecting valuable information from their customers about their buying or using habits.
Making sense of accumulated data can help give businesses a competitive edge by making better predictions for business decisions, creating a data-backed strategy and improving product development.
Businesses look to data scientists to develop algorithms that extract and make sense of Big Data accumulated by businesses over time by combining high-level math, stats and programming.
The Biggest Misconception About Data Mining
Business decision-makers who lean on data scientists to come up with data-backed prediction models need to keep one thing in mind: "There is no age-old recipe for extracting
meaningful data from business operations," says Ray Bao, data scientist at CyberCoders.
Most universities don't teach data mining as it relates to data science in computer science curriculum. The closest thing to schooling for data mining at the university level are relatively new post-graduate programs, like University of California, Berkeley's newest school:
UC Berkeley's Masters of Information & Data Science.
"However, often times, what is taught in academia pales in complexity to real-world problems (e.g. the classic Netflix contest) and that's when the ingenuity and creativity of data scientists are called upon," Bao says.
In other words, it's one thing to be able to pull data based on a given number of factors. It's a whole another playing field to pull data and then proactively ask the right questions, thinking critically to solve complex business problems.
At its Core: Data Mining Comes Down to Statistics
The best data mining professionals have a strong foundation in statistics.
“The basic principles of knowledge induction,” Galka says.
- Regression analysis
- T tests
- Analysis of variance
“Computer programming and database skills are important as well, but they come secondary to the stats,” he adds.
Data Science vs. Software Development
Data science and software development are two core niches of data mining, and the top skills to be successful in data mining depends on your focus, according to Joshua Fox, PhD and software architecture and technical evangelist.
Keep in mind: Developers can write software that captures data from business operations defined by management and store it in a database--but that, alone, doesn't encompass "data mining."
"Data scientists, backed by mathematics and statistical methods, determine why data is stored and what data to store," Bao says. "Furthermore, we are concerned beyond the how data is stored and look to extract meaningful conclusion and guidance."
Data Science:
“If you want to get into the data science side, then a deep knowledge of the basic algorithmic tools is essential,” he says.
- Clustering
- Collaborating
- Filtering
“It’s better to dive into a few powerful categories of algorithms and understand how to tune and tweak them than try to learn the cutting-edge math in every area of data minding,” Fox says.
Software Development:
In software, you need to master the key platforms for data mining, Fox explains.
- Hadoop is a legacy market leader
- Apache Spark is a good starter, and has gained huge popularity on the Web.
“Along with the Spark platforms come powerful machine learning and graph analysis libraries, which the aspiring software developer should learn,” Fox says. “Luckily, Apache Spark came out so recently that anyone who learns it has a head start on the competition.”
3 Ways to Hone Your Data Mining Skills
- Bootcamps
There are a few data science boot camps that would help you sharpen your data mining skills. Metis is a 12-week data science boot camp in New York, but the price tag is a bit hefty at $14,000. San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD) is another host of a data mining boot camp series.
- Books and Videos
Code Condo offers a great resource list of nine free books for learning data mining and data analysis. This set of video lectures offers a great breakdown of different facets of data mining as well. Coursera, in addition, offers a great tutorial series for practicing data mining as a specialty.Also, periodically check out Revolution Analytics, a blog to help you learn more about using R for big data analysis.
- Hands-on Practice
Another great way of honing data mining skills is to, well, dig into some data. You can download large data sets that are available online. This huge list of data sets open to the public on Quora is another great resource for those of you who want to get their hands dirty in data mining!