Machine learning, a computerized system that extracts meaningful knowledge from its recognition of patterns in data, is perhaps the best-known subset of AI. It just might be the most exciting, as well, particularly for enterprises seeking to revolutionize and digitize their marketing, operations, and other business systems.
Without rules-based programming, the algorithm “learns” from its experiences and improves with each new computation. But what do you really need to know about how it works? In this article, we’ll take a look at some of the basic foundations and concepts of machine learning and check out a few tools you may find useful as you explore its possibilities.
If At First You Don’t Succeed...
Suppose you’re learning how to play basketball. On your first jump shot attempt, you’re painfully short of the basket. On your second try, you really bend those knees. The ball makes it to the basket but crashes into the backboard before hurling to the ground.
On your third attempt, you shoot the ball with the proper arc and trajectory while applying the other learned behaviors from your previous attempts. Finally, you’ve made your first jump shot and assumed a Kobe “Black Mamba” Bryant mentality. Noticed what happened there? Each new attempt gave you an opportunity to learn and improve your performance for future attempts.
This is the essence of machine learning.
Previously, we would have extracted what we learned from a machine’s failure and reprogrammed it with new rules in an effort to bring us closer to a successful outcome. Now, the machine learns on its own and will continue evolving with each new input, or data, we give it.
Remember How Fun Databases Were?
Data is a first-class citizen in the world of machine learning, so it’s important to be familiar with basic data terminology. The good news is that if you understand how traditional databases work, you already have a pretty good reference point as to how data is structured for machine learning.
A database table has columns and rows to describe the data. Think of a row as a person: the columns describe the person’s attributes, (i.e., name, age, gender, occupation.) The rows contain the actual values of each particular attribute (i.e., Angie, 28, female, software engineer).
With machine learning, the structure of data is similar, but the terminology is different. In the illustration above, you can see that the wrapper holding the data is called a ‘dataset’ instead of a table. The attributes of the data are called features, and the entities are called samples.
For Best Results, Feed Your Machine Quality Data
More important than the structure of the data is the data itself. In order to make a model that is able to predict accurately, you should intimately know the problem you’d like to solve. This means you’ll need to take a close look at your raw data to ensure it will satisfy all of your requirements. If it does not, you may need to get your hands dirty and perform an exploratory data analysis.
The data you’re working with should be relevant, connected, accurate and plentiful enough to build a fine-tuned model. Let’s take a quick look at how each contributes to producing a better input—and a better output as a result.
Relevant Does your data provide adequate information to find a solution to the problem you’re trying to solve? Furthermore, do the features of the data capture your desired data points properly? If the answer is no to either question, reevaluate the relevancy of your data or transform your data to align more with your goals. Example: If you’re trying to build a model that produces a movie recommendation engine, you probably are not going to care about sporting events data.
Connected Data that is present and has meaningful values will be considered connected. If your data has inconsistent or missing values across a wide array of samples, it will be considered disconnected. Later, that could lead to inaccurate predictions.
Accurate Data measures towards a specific target or targets. Example: If you have a model that predicts the image of a dog but notice a lot of chicken images within the dog classification, the data could be inaccurate.
Plentiful The dataset has a large enough number of samples to make reasonable predictions. In machine learning, data, is pivotal. It is your responsibility to make sure the data you’re using is intact, reliable and meaningful. Also, don’t underestimate the power of more data. Machine learning models work best with larger datasets; the more data, the better.
Training Data vs. Testing Data
When beginning the learning phase, data is split into two parts: training data and test data. Training data builds and refines our model. To rate our model’s performance, we pass in the test data. A standard split of data consists of 75% training data and 25% test data.
Our goal is to build a generalized model that can take-in test data and make accurate predictions well.
Machine Learning Concepts: Supervised, Unsupervised, and Reinforcement
Most machine learning algorithms fall under one of the learning concepts listed below:
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
When you know enough about your data, you can guide the machine with labeled samples. In supervised learning, we have both input variables and an output variable. The algorithm learns how to map from one to the other.
Let’s say we are building a model for a fraud detection dataset. In addition to the features (such as credit score, debt amount, rating and so on), there we’ll have labeled data indicating if the sample is “fraud” or “not fraud.” This simple example uses a binary classification algorithm because we need to know if the data point is fraudulent or not.
Supervised learning allows the machine to figure out some of the relationships due to the labels. For example, a machine may notice that samples in our fraud system with a credit score of 500 or less and a debt amount of $5,000 or more will result in more positive fraud detection. Machines are great at pattern recognition. They can figure out how certain features interrelate to one another and the impact that has on your model’s final output.
Learning by trial and error is the key to unsupervised learning, where we have input variables but no corresponding known output.
When a baby begins to talk, for example, much of their learning happens through trial and error. First, the child learns to coo. They begin playing and experimenting with different sounds. Eventually, coos turn into words as they continue studying and observing their environment. Before you know it, full sassy sentences are spilling out of your child’s mouth.
Unsupervised learning operates in a similar fashion. The machine studies the data and makes observations. Output results cluster together based on commonalities found in a feature or set of features. Again, machines will do their best to connect the relationships without aid or human interaction normally found in supervised learning.
Reinforcement learning is one of the trendier learning concepts. For the record, I’m a dog person. It seems like only yesterday that we took our pet dog, Ella, to a few training sessions at a local pet store. We learned about a dog training technique called positive reinforcement. The principle around this technique is that anytime your pet performs the desired behavior, you immediately praise and reward them. Your pet will soon realize that if it sits when the command is given, he or she will get yummy treats.
Machines are able to adapt this same learning technique.
With reinforcement learning, the machine figures out which actions yield the greatest rewards. Taking the correct action rewards the machine. It’s in the machine’s best interest to continually improve performance quickly and as efficient as possible. The idea here is that the machine will acquire many rewards if it follows a good policy, making it a happy little singularity.
Machine Learning Algorithms
There are many machine learning algorithms out there, and they’re typically grouped either by learning style or their similarities in form or function. Jason Brownlee, PhD, offers an excellent tutorial on machine learning algorithms to help you get familiar with your options. In it, he shares this mindmap of over 60 machine learning algorithm types, grouped by type. Check it out:
Your algorithm choice is strongly influenced by the problem you’re trying to solve and how you’re trying to solve it. Be sure you fully understand the problem you intend to solve. Once you understand the problem, you can confidently reach into the algorithm toolbox to find the proper solution.
Learn More About Machine Learning
Machine learning is a great way to solve complex problems without the cumbersome approach of traditional software. Now that you’ve learned the basics and understand how it works, here are some great resources for really digging into it:
I find myself fascinated with machine learning already, but have only begun to get my feet wet. As a practicing software engineer, I always thought the only thing that will limit me in this field was my imagination. When it comes to machine learning and the broader scope of AI, I now believe that our imaginations are only the beginning!