While presenting advanced analytics practice to business executives or sales partners, it is important to let them understand some of the basic concepts. We need to be able to explain in a few sentences what some buzzwords are about. That’s the purpose of this blog.
While I disagree with the statement that big data is just about machine learning, it shows how important it is, and how widely it is currently used. Author Samuel has given a really good definition to machine learning in 1959 as a “field of study that gives computers the ability to learn without being explicitly programmed.” For years, human beings have been doing a better and better job programming computers to do things for us, we are getting ready to let the machines build algorithms, study processes, and make decisions on their own. It’s made possible because of the maturity of technology to handle large volume of information in a timely fashion. That’s how machine learning becomes the jewel of the big data crown.
Machine learning can be overlapped with statistics. It uses mathematical optimization to build models, analyze data and deliver predictions. Machine learning can be categorized into supervised learning and unsupervised learning. In supervised learning, examples of input and output are presented by human being, the supervisor, and through calculation, the computer will learn the rules that map the inputs to the outputs. In unsupervised learning, no goal is provided and methods are designed for the computer to find out the structure of the data or a means to the ends.
Common problems that can be solved by machine learning are grouped into classification, clustering, regression, density estimation, and dimensionality reduction.
Deep learning can be easily confused with machine learning. It is actually a branch of machine leanring that learns to represent data in an abstract way. It gets the name by using multi-layer non-linear processing units. The units can be supervised or unsupervised, and each layer uses the output of previous layer as input. The number of layers in deep learning is closely tied to the level of abstraction of the data, since it assumes the observed data are generated by the interactions of factors organized in layers.
Deep learning is actually a rebranding of the old neroscience because it is similar to the way information is communicated and processed in a nervous system, which defines a relationship between various stimuli and associated response in the brain.
A most successful deep learning algorithm is ANN – artificial neuro networks. It has addressed many problems such as image classification, language translation and spam identification.
The term artificial intelligence (AI) has most history and has a broader meaning. It mimics human minds and builds cognitive functions to learn and to solve problems. AI uses machine learning and deep learning algorithms. We could say the ultimate goal of AI is to build a machine that can think, talk and behave just as human, (such machines have been depicted vividly in countless books and movies,) but today, we have successfully build robots who can chat with us, machines able to beat the best human Chess or Go players, and cars that drive by themselves.
Pattern recognition is a machine learning method to assign labels to input values, therefore, recognize the regularities, or patterns in the data. Pattern recognition aims to give a reasonable explanation of all possible training data, therefore, pattern matching can be applied to find a pattern for all new incoming data. We could also identify anomalies that do not match the recognized patterns.
Feature engineering is also a machine learning method to find the characteristics of the data. We can define a lot of attributes to the data, and the ones that can be used for prediction of any sorts are features. Feature engineering is an important part of predictive modeling, and the definition of the features will heavily impact the results of prediction. Feature engineering process involves brainstorming, buiding, repetitive validation, improving of the features and usually involves both data analytics and business users.
If you have read to this point, you are really interested in advanced analytics. Please stay tuned as I will explain machine learning tools including Spark MLlib in detail in my future blogs.