# Characteristics of ML

> “Out of every one hundred men, ten shouldn't even be there, eighty are just targets, nine are the real fighters, and we are lucky to have them, for they make the battle. Ah, but the one, one is a warrior, and he will bring the others back.” - Heraclitus

From the way I see, industries slap their products with the labels of ML for it being this trendy esoteric concept that few understand. Meanwhile, others believe plucking a few intuitive variables off their head, or pasting a best-fit line amounts to ML.

Simply no. Like with all things in the world, where dolphins belong in the sea (not zoo) and humans on land, there are areas where ML excel in and doesn't.&#x20;

## Characteristics of ML

This should probably be labelled as *'guidelines for a successful machine learning' scenario.*&#x54;hey are

* A pattern or trend is likely to exist
* There is no mathematical proof
* Lots of input data

## 1. Existing pattern or trend

In its simplest, if you cannot logically convince yourself that the subject matter can be predictable, it would be likely that the machine would not be able to as well.

**Scenarios where a pattern or trend may not exist:**

* RNG (random number generator) program

**Scenarios where a pattern or trend may exist:**

* Price of a stock
* Probability of cancer
* Score of a test

![Einstein about to crack the slot machines code with ML and strike rich 🤑](https://www.gamblingsites.org/wp-content/uploads/2019/11/Slot-Machines-at-a-Casino-Man-Thinking.jpg)

Despite having gazillion rows of data for a RNG program, we will unlikely be able to predict the very next number to be generated (given it is not a bad program that defeats its very purpose, but do research on 'true random' vs 'pseudo random' if you're interested).

## 2. No mathematical proof

The scenario that we want to predict must not be able to be derived mathematically.

For example, given the **radius**, we will be able to precisely and mathematically tell both the

* Area of the circle, and
* Circumference of the circle

![nani is your area or circumference plsz?!](https://cdn.dribbble.com/users/2863981/screenshots/6133983/image.png)

Hence, there will **never** be a case where one attempts to predict the area of a circle given its radius.

## 3. Large data set

The underlying logic can be explained using statistics, and is further elaborated in the next topic. It goes by:

* The larger the dataset we have (sample size),
* The better the dataset is able to represent the population (Law of Large Numbers),
* The better the machine is able to learn about the **population** with the help of this large **sample**

With a small data set, it is likely that we can't conclusively say anything about the population, despite having a groundbreaking hypothesis.

This is also the reason for the preference of **test set error** over **train set error**, as the model would have only truly learnt when applied to unseen data sets, whereas swimming with its own merely amounts to 'memorising'.

![Scenes before man gets drilled by Law of Large Numbers](https://steemitimages.com/640x0/https://steemitimages.com/DQmdT1BdVdAgFYmjzfvFRtHKy9unx1XtU8RQgjBP5SvufMb/russian_roulette.jpg)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://tony-ng-1.gitbook.io/what-is-ml/characteristics-of-ml.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
