# Learning Model

![ord lo](https://mustsharenews.com/wp-content/uploads/2019/06/amino.jpg)

## Learning Model

The learning model contains of 2 components:

1. **Hypothesis Set**
2. **Learning Algorithm**

## 1. Hypothesis Set

Firstly, a **ML model** creates the set of hypothesis. A model can be for e.g.

* Linear Classification
* Linear Regression
* Logistic Regression

Depending on the **learning scenario**, a model to be used can be favored over another.&#x20;

In the credit-card approval case, the '**Perceptron Learning Model**' is the preferred for **Linear Classification**. Where **d amount** of **x variables** refers to input attributes e.g. "gender", "age", "salary"

$$
x = (x\_1, x\_2, ... x\_d)
$$

The bank should **approve credit** if

$$
\sum\_{\mathrm{i=1}}^dw\_ix\_i > threshold
$$

and **deny credit** if

$$
\sum\_{\mathrm{i=1}}^dw\_ix\_i < threshold
$$

with the **combination** of both formulas to be potentially rewritten as

$$
h(x)=sign((\sum\_{\mathrm{i=1}}^dw\_ix\_i) -threshold)
$$

with each **hypothesis** looking like:

![PLA](/files/-Ly8m0mvbm8nr6yVaNOF)

### Control

As mentioned, the model creates the hypothesis set (H). However, there are **not many parameters we can control** after picking a model. For example, we cannot simply alter the formula of the Perceptron Learning Model to produce multiple hypotheses (for the logic will be changed).&#x20;

![Saitama controlling a rock by pinching his noise](https://vignette.wikia.nocookie.net/onepunchman/images/e/ea/6c44ee22b0faba1f-938x512.png/revision/latest?cb=20190716151417)

In fact, there are only **2 parameters** within our control in the entire universe of ML:

#### Weight (w)

* Each weight is attached to each **input variable (x)**
* In this case, the **more important** a variable is in the context of approving of credit-card application, the greater the value of weight should be given to influence the instance to exceed the threshold

#### Threshold&#x20;

* This can be an **arbitrary value/constant** that the sum of all scoring for each variable each candidate must minimally meet to be considered approved
* This can also be used as an **ends to a mean**, for example to use threshold to approve 250,000 people maximally in their credit-card applications

Through **altering these 2 parameters**, we can essentially create multiple hypotheses to eventually have a **hypothesis set (H)**, with the **best hypothesis (g)**.&#x20;

### Masking Threshold

The threshold is very much like a weight. Both are *parameters* (values that change) to be tweaked. Hence, we can treat threshold as a special 'weight' $$w\_0$$ to greatly simplify the formula from

$$
h(x)=sign((\sum\_{\mathrm{i=1}}^dw\_ix\_i) -threshold)
$$

&#x20;to

$$
h(x)=sign(\sum\_{\mathrm{i=1}}^dw\_ix\_i+w\_0 )
$$

and similarly grouping $$w\_0$$into the summation notation, to

$$
h(x)=sign(\sum\_{\mathrm{i=0}}^dw\_ix\_i)
$$

### Vectorizing

Do take a look at my GitBook [**Matrices and Linear Algebra Fundamentals**](https://tony-ng-1.gitbook.io/matrices-and-linear-algebra-fundamentals/) if you do not know what matrices are!

At this juncture, the formula of the model we're using now is pretty neat, but its major problem is that its algebraic computation can be pretty **brutal** to implement. To illustrate this, lets take Alice from our example.

|  Name | Age | Gender | Salary | Debt | Default |
| :---: | :-: | :----: | :----: | :--: | :-----: |
| Alice |  23 | Female |  24000 |   -  |    1    |

By implementing our model's formula and slotting arbitrary weights, we'll get

$$
h(Alice) = sign((-3.2\times 23)+(1.5\times 0)+(0.25\times 24000)+(-1.5\times 0))
$$

However, if we **vectorize** both the **weights** and **input variable x**, we'll get

$$
\left\[\begin{array}{cc}
-3.2\ 1.5 \ 0.25  \ -1.5
\end{array}\right] \times
\left\[\begin{array}{cc}
23 & 0 & 24000 & 0
\end{array}\right]
$$

Ok in pure honesty it does not look that bad LOL, but will probably look **worse** with more input variables. Hence, the vectorized formula is written as

$$
h(x) = sign(w^T x)
$$

## 2. Learning Algorithm

The learning algorithm simply **updates** weights.&#x20;

Hence, for a Perceptron Learning Algorithm, it merely corrects **misclassified points** where

$$
sign(w^Tx\_n) \neq y\_n
$$

Where $$y\_n$$is derived from the target function.

### Negative to positive

For example,&#x20;

* $$sign(w^Tx\_n)=-1$$from the **hypothesis**, while
* $$y\_n = +1$$from the **target function**,
* Point n is thus **misclassified**

Where

* $$\theta <90\degree$$and acute, product is **negative**,
* $$\theta = 90\degree$$and right, product is **0**,
* $$\theta > 90\degree$$and obtuse, product is **positive**,

![https://www.quora.com/Can-a-scalar-product-be-negative](https://qph.fs.quoracdn.net/main-qimg-eff9b21e0b8061546e9a661b662d2860)

The product (y) from the dot product of Vector $$w\_1$$and x is **negative**, whereas it should be positive.

![Vectors of w\_1 and x](/files/-LyA3YTMfvfIqwydMRFo)

Hence, the Perceptron Learning Algorithm corrects the misidentified with

$$
w\_2 \leftarrow w\_1+y\_nx\_n
$$

Since $$y\_n = +1$$, we are simply adding x to the weight, where

$$
w\_2 \leftarrow w\_1+x\_n
$$

With the new Vector $$w\_2$$being

![Vector w\_2](/files/-LyA6HVOQjEf9nZroG9T)

The new angle $$\theta\_2$$formed with vector Vector $$w\_2$$and $$x$$is $$<90\degree$$, and the Perceptron Learning Algorithm has updated the weight to correctly classify the point N. 😵

### Positive to negative

The reverse is true as well.&#x20;

*-just skip the following if you understand the previous lol-*&#x20;

Given the scenario where,&#x20;

* $$sign(w^Tx\_n)= +1$$from the **hypothesis**, while
* $$y\_n = -1$$from the **target function**,
* For $$sign(w^Tx\_n) \neq y\_n$$, thus n is **misclassified**

Where the product (y) from the dot product of Vector $$w\_1$$ and x is **positive**, whereas it is currently negative.

![Vectors of w\_1 and x](/files/-LyE_vSRdYKYGc55hHqZ)

Hence, the Perceptron Learning Algorithm corrects the misidentified with

$$
w\_2 \leftarrow w\_1+y\_nx\_n
$$

Since $$y\_n = -1$$, we are simply subtracting x from the weight, where

$$
w\_2 \leftarrow w\_1-x\_n
$$

With the new Vector $$w\_2$$being

![Vector w\_2](/files/-LyEcVU3zPCgvnWvbFws)

The new angle $$\theta\_2$$formed with the vector Vector $$w\_2$$and $$x$$is $$>90\degree$$, and the Perceptron Learning Algorithm has updated the weight to correctly classify the point N. 😵

### Iteration

**If** the training examples are truly *linearly separable*, all points can be correctly classified through repeating the above Perceptron Learning Algorithm for all points. This is despite misclassifying correctly classified points previously.

![PLA hard at work](https://dailypicdump.com/media/20161017/me-trying-to-help.gif)

That is **if** the data set is truly linearly separable. If not, we can attempt to transform the data to make it linearly separable.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tony-ng-1.gitbook.io/what-is-ml/ml-scenario/learning-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
