Machine Learning and Artificial Intelligence

Well designed technology is invisible. In the case of machine learning, the heavy lifting is done behind closed doors inside the black box that is Artificial Intelligence (AI). For over 50 years researchers have crafted the pieces necessary for machine learning to be useful in our everyday lives.  In some ways they’re just getting started.

While it is not necessary to fully understand how machine learning works in order to utilize it, there are some key mathematical concepts that drive much of the technology. For those interested in a deeper understanding of just what machine learning is, and how it works, here are the mathematical techniques involved:

Linear and Polynomial Regression

A simple regression is the most widely used statistical technique in small businesses today. Regression is used to measure the relationship between two variables such as correlating rainfall and crop yields.

Decision Trees

These tree-like flowcharts use branching nodes to illustrate possible outcomes of a decision.

Neural Networks (also called Perceptrons)

Artificial neural networks are one of the main pieces of machine learning. As the “neural” part of their name suggests, they are inspired by the way biological nervous systems process information. Each neuron ‘votes’ on the decision outcome (positive or negative), which might then trigger other neurons to vote, and the votes are tallied creating a ranking of the outcomes depending on the support each has received (See Neural Network image below). Neural nets are used to solve all types of problems such as image recognition, speech recognition, and natural language processing.

As described so well in an article by Luke Dormehl, “For a basic idea of seeing a neural network in action, imagine a factory line. After the raw materials (the data set) are inputted, they are then passed down the conveyor belt, with each subsequent stop or layer extracting a different set of high-level features.”

If the network is intended to recognize an object, it does so in the same method- using layers.  As Dormehl further illustrates, “The next layer could then identify any edges in the image, based on lines of similar pixels. After this, another layer may recognize textures and shapes, and so on. By the time the fourth or fifth layer is reached, the deep learning net will have created complex feature detectors. It can figure out that certain image elements (such as a pair of eyes, a nose, and a mouth) are commonly found together.”

This is a high-level conceptualization, but if you’re interested in diving deeper into an example of how neural nets can be used to identify handwritten numeric digits, Michael Nielson’s “Neural Networks and Deep Learning” (here) does a great job of explaining the topic in easy to follow detail.

Bayesian Networks (also known as Belief Networks)

These graphical structures are used to represent knowledge about an uncertain domain. The graph is a probabilistic map of causes and effects where each node represents a random variable, while the edges between the nodes represent probabilistic dependencies.

Here is a simple Bayes Net that illustrates these concepts:

In this simple world, the weather can have three states: sunny, cloudy, or rainy. The grass can be wet or dry, and the sprinkler can be on or off. If it is rainy, then rain will water the grass directly. If it is sunny for an extended period of time, a sprinkler can water the grass indirectly.

When actual probabilities are entered into this net based on real world weather, lawn, and sprinkler-use-behavior, such a net can help answer a number of useful questions.  These questions might be, “If the lawn is wet, what are the chances it was caused by rain or by the sprinkler?” or “If the chance of rain increases, how does that affect the need for watering the lawn via sprinkler?”

Support Vector Clustering 

The objective of clustering is to organize data into more meaningful collections by partitioning a data set into two or more groups. For example, segmenting customers by similar buying behavior creates clusters.

Markov Chains

These are mathematical systems that hop from one “state” (a situation or set of values) to another, and it is assumed that future states depend only on the present state and not the sequence of events preceding it. For example, if you made a Markov chain model of a baby’s behavior, you might include “playing”, “eating”, “sleeping”, and “crying” as states, which together with other behaviors could form a “state space”: a list of all possible states. In addition, a Markov chain tells you the probability of hopping, or “transitioning”, from one state to any other state – e.g., the chance that a baby currently playing will fall asleep in the next five minutes without crying first.

Convolution Neural Network (CNN)

Convolutional Neural Networks are very similar to ordinary neural nets but instead make the explicit assumption that the inputs are images. This assumption allows for more efficient implementations of image recognition.

Generative Adversarial Networks (GANs)

This framework, as illustrated by Thalles Santos Silva, uses two different neural networks to solve a problem, a generator and a discriminator. The generator tries to produce data that comes from some probability distribution. The discriminator acts like a judge. It gets to decide if the input comes from the generator or from the true training set. 

For example, let’s say you want to go to a concert but don’t have a ticket. The concert has a security check to ensure no one gets in who hasn’t paid (discriminator), so you hire a friend to design a fake ticket (generator).

The friend designs a ticket and shows the security check who obviously calls out the fake. This determination of being a fake is feedback to your friend that they need to continue to tweak their design. This process of a friend designing a ticket, showing the security check, and gathering feedback continues until the generator produces a replica that passes the security check.

Utilizing machine learning does not require a deep understanding of the above topics, but for data scientists a deeper dive is beneficial. Hopefully this list illuminates some of the inner workings of artificial intelligence and how mathematics power much of the magic behind machine learning.

At swivl we’re dedicated to making work with AI easy and intuitive. The most powerful AI solutions will be those that work in tandem with humans, or as we like to say “human-in-the-loop” systems.

Join our newsletter to keep up with all the latest trends on how AI is enhancing the customer experience.