‘Deep Learning’

1 minute read

A trained model can be treated just like a regular computer program.

What is a program? A set of instructions given to a computer. Programmers spend their time writing specific instructions for a computer to follow in order to achieve a specific task. If anything goes wrong in the program, it is almost certainly due to ‘bad’ instruction (aka, bug). So the programmer has to backtrack and use one or more degugging techniques to find out where the fault is, correct it and then watch the program work as expected.

In 1949, a man named Arthur Samuel started working on a different way to get computers to complete tasks, which he called machine learning. The idea of machine learning is that instead of writing specific instructions for a computer to follow to perform a task, we write the template/architecture for a possible program. Then we write instructions for the computer to discover the best way to complete that template in order to achieve optimal performance in a task. The full program (set of instructions) that results after the machine has completed the template is called a model. The stuff that are filled in to complete the architecture are called parameters or weights. The process of discovering those optimal parameters is called training the model. And the results from the program are called predictions.

So the question now is how to get the machine to discover the right parameters to go in those holes (training). We do this by writing instructions (loss function) for the machine to test how well a model has performed on the task, and then instructions for the computer to choose parameters that perform better than the parameters that have been previously tested (this is possible using a mathematical process known as gradient descent).