Terminologies

Before we fly, let's crawl through some commonly used terminologies.

Input variable (x)

The information used to learn
There can be more than 1 variable
For example, "Gender", "Age", and "Salary" in the credit-card example.

Output variable (y)

The desired output after learning
There can only be 1 output variable
For discrete, this can either be defaulting or not defaulting in credit-card
For continuous, this can be predicting the amount of credit to give

Target function (f)

'f' here refers to the ideal hypothesis
A hypothesis maps a sample to its various y values. With this, the actual training examples are 'generated'
f : x → y, means eating some x variables, and spit out y variables
f is unknown

I like analogies, especially this one from 3Blue1Brown (I believe). A function merely eats up data, and spits out data. Different function transforms data differently, in which can be interchangeably referred to as hypothesis.

Also, I understand that point 2 can be extremely confusing (at least for me, but will further elaborate it to my best in the next topic), but in short it is that the 'real' set of output variable is generated with the target function f.

Data (x, y)

Generated from the target function f(x)
A combination of both input (x) and output (y) variables

(x_1,y_1),(x_2,y_2),...,(x_N,y_N)

Each instance is represented by the row number N

Hypothesis (g)

We can have multiple hypotheses in our hypothesis set (H).
However, there can only be one (and only one) we select to be called g
Hence, $g \in H$ , or g is in H
g is the best hypothesis approximating the f, $g \approx f$

To put it all together,

PreviousML Scenario NextLearning Model

Last updated 5 years ago

Was this helpful?