What is Machine Learning?
Machine Learning(ML) is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence(AI). Machine Learning involves using statistical methods to create programs that either improve performance over time or detect patterns in massive amounts of data that humans would be unlikely to find.
In short machine learning is:
⇒ About extracting knowledge from data.
⇒ A research field at the intersection of statistics, artificial intelligence, and computer science is also known as predictive analytics or statistical learning.
Types of Machine Learning Algorithms
The machine learning algorithm is sub-categorized into four types and they are:
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
⇒ Supervised learning is an ML task of learning, where input is mapped to output based on example input-output pairs.
⇒ It is a method in which we teach the machines using labeled data.
⇒ Under supervised learning, we have two main categories of problems as Classification and Regression problems respectively.
⇒ The important difference between classification and regression is that basically classification is about predicting a label or a class whereas regression is about predicting a continuous quantity.
⇒ Stock prices prediction, face recognition, etc. are some examples of supervised learning.
- Classification is the task of predicting a discrete class label.
- In a classification problem, data is labeled into one of two or more classes.
- A classification problem with two only classes is called binary classification and more than two classes are called multi-class classification.
- Classifying an email as spam or non-spam is an example of a classification problem.
An algorithm that can be used for Binary Classification includes:
- Decision Trees
- k-Nearest Neighbors
- Logistic Regression
- Naive Bayes
- Support Vector Machine
Many algorithms that are used in binary classification can also be used for multi-class classification. Algorithms that can be used for Multi-class Classification include:
- Decision Trees
- Gradient Boosting
- k-Nearest Neighbors
- Naive Bayes
- Random Forest
- Regression is the task of predicting a continuous quantity.
- A regression problem requires the prediction of a quantity.
- A regression problem with multiple input variables is called a multivariate regression problem.
- Predicting the price of a stock over a period of time is a regression problem.
⇒ In unsupervised learning, the machine is trained on unlabeled data without any guidance. No idea which types of results are expected.
⇒ Unsupervised learning can be thought of as self-learning where the algorithm can find previously unknown patterns in datasets that do not have any sort of labels.
⇒ Recommender systems, fake news identification, etc. are some examples of unsupervised learning.
⇒ The two types of Unsupervised Learning are:
- Association analysis is the task of uncovering relationships among data i.e. discovering patterns in data, finding co-occurrences, and so on.
- An association rule is a model that identifies how the data items are associated with each other.
- A classic example of association rule mining is the relationship between bread and jam. So people who tend to buy bread also tend to buy jam. Overall it is all about finding associations between items that frequently co-occur or items that are similar to each other.
- Approaches for Association Rules Mining are:
- Brute-Force Approach
- Apriori Approach
- Frequent Pattern(FP) Growth Method
- Clustering is the process of finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters.
- A good clustering method will produce high-quality clusters with:
- High intra-class similarity
- Low inter-class similarity
- Example: Digital AdWords use a clustering technique to cluster potential buyers into different categories based on their interests and intents.
- Different types of Clustering Algorithms are:
- Hierarchical Method
- Non-hierarchical Method
- Decompose the database into several levels of partitioning which are represented by the dendrogram
- Two types of this method are:
Agglomerative method or Bottom-up
- In this method, objects start in their own separate cluster.
- The two closest(most similar) clusters are then combined and this is done repeatedly until all the objects are in one cluster.
- In the end, the optimum number of clusters is then chosen out of all cluster solutions.
Divisive method or Top-down
- In this method, all objects start in the same cluster and the above strategy is applied in reverse until every object is in a separate cluster.
- Agglomerative methods are used more often than divisive methods.
- This method is also known as the K-means clustering method or Partitioning clustering.
- Here in this method, partition the database into k-clusters which are represented by representative objects of them.
⇒ It is also known as hybrid learning and it lies between supervised and unsupervised learning.
⇒ Combination of labeled and unlabeled data.
⇒ It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both unsupervised and supervised learning while avoiding the challenges of finding a large amount of labeled data.
⇒ We use a semi-supervised learning algorithm to label the data, and retrain the model with the newly labeled dataset. Then, we apply the retrained model to new data, more accurately identifying fraud using supervised machine learning techniques.
⇒ Localizing objects, document classification, etc are some examples of it.
⇒ Reinforcement learning is the type of machine learning that does not consist of any training data sets.
⇒ In this learning, agent interacts with its environment by producing actions and discovers either errors or rewards.
⇒ In reinforcement learning, the key difference is that the input itself depends on the actions we take. For example, in robotics, we might start in a situation where the robot does not know anything about the surrounding it is in. So after it performs certain actions it finds out more about the environment it exists and take decisions whether to move right or whether to move backward or forward.
⇒ In the above case, the robot is known as the agent and its surrounding is the environment. So for each action it took, it can receive a reward or it might receive a punishment.
⇒ Thus the main objective of reinforcement learning algorithms is to map situations to actions that yield the maximum final reward.
⇒ Some of the examples of reinforcement learning techniques are:
- Markov decision process
- Monte-Carlo methods
- Temporal difference methods