Forward selection is a regression which begins with an empty model and adds variable one by one. In each step, we add the one variable that gives the single best improvement to your model.
Continue reading “Forward Selection”Category: Blog
Statistical Significance
Imagine you are doing a coin toss. You assume that the coin is a fair coin which has two sides: a picture and a number. When you do the first toss, what appears is the picture. In this stage you feel everything is normal. The probability for the number side and the picture side are same: 0.5.
Continue reading “Statistical Significance”Fit and Transform
This is a very simple, yet useful concept to understand.
.fit is used to execute the operation being commanded, such as calculating the mean, most frequent, or median.
After the operation is conducted, it is then entered to the data by using .transform method.
Backward Elimination
Backward Elimination is selection technique to remove those features that don’t have a significant effect on the dependent variable or prediction of output.
Continue reading “Backward Elimination”Dummy Variable Trap in Regression
Assume we have variable of sex: female and male. If in the analysis we use two dummy variable, let’s say female is represented by f and male represented by m, there will be a problem of multicollinearity (one value can be predicted from the other values). This is caused by the fact that the the row of f and m in the matrix is highly correlated: The i-th row in the f is correlated with the i-th row in the m. For example: if i-th row in f is 0, so the i-th is 1, vice versa. Thus, the determinant will be 0.
Continue reading “Dummy Variable Trap in Regression”Categorical and Continuous Data
Categorical data is a type of data that can be distinguished between groups and we can list a small number of categories for it. This includes product type, gender, age group, country, etc. It has a finite amount of number.
Continue reading “Categorical and Continuous Data”Label Encoder and One Hot Encoder
Some of you might confuse about these two encoders which are parts of Scikit learn library in Python. The encoder is used to convert categorical data or text data into numbers. Why it should be converted? Because model only understand the numbers.
Continue reading “Label Encoder and One Hot Encoder”Confusion Matrix
After building a model, of course we will test it. And yeah we got the result. But wait, how do we measure the effectiveness of our model? That’s what confusion matrix made for.
Continue reading “Confusion Matrix”Super Important Shortcut in Jupyter Notebook
Ato insert a new cell above the current cellBto insert a new cell below.Mto change the current cell to Markdown
iloc
iloc is used for filtering rows and columns based on integer position. For example:
Continue reading “iloc”