Terminologies

Terminologies

Before we fly, let's crawl through some commonly used terminologies.

Input variable (x)

  • The information used to learn

  • There can be more than 1 variable

  • For example, "Gender", "Age", and "Salary" in the credit-card example.

Output variable (y)

  • The desired output after learning

  • There can only be 1 output variable

  • For discrete, this can either be defaulting or not defaulting in credit-card

  • For continuous, this can be predicting the amount of credit to give

Target function (f)

  • 'f' here refers to the ideal hypothesis

  • A hypothesis maps a sample to its various y values. With this, the actual training examples are 'generated'

  • f : x ā†’ y, means eating some x variables, and spit out y variables

  • f is unknown

I like analogies, especially this one from 3Blue1Brown (I believe). A function merely eats up data, and spits out data. Different function transforms data differently, in which can be interchangeably referred to as hypothesis.

Also, I understand that point 2 can be extremely confusing (at least for me, but will further elaborate it to my best in the next topic), but in short it is that the 'real' set of output variable is generated with the target function f.

Data (x, y)

  • Generated from the target function f(x)

  • A combination of both input (x) and output (y) variables

(x1,y1),(x2,y2),...,(xN,yN)(x_1,y_1),(x_2,y_2),...,(x_N,y_N)
  • Each instance is represented by the row number N

Hypothesis (g)

  • We can have multiple hypotheses in our hypothesis set (H).

  • However, there can only be one (and only one) we select to be called g

  • Hence, gāˆˆHg \in H, or g is in H

  • g is the best hypothesis approximating the f, gā‰ˆfg \approx f

To put it all together,

Last updated