“It is better to be approximately right rather than precisely wrong.”- Warren Buffet
In Machine learning if the Model is focused on a particular training data so much that it missed the essential point then we consider the model is overfit. Hence it provides an answer which is far from correct. That means the accuracy is low. This model considers noise from the unrelated data as the signals which will affect the accuracy of the model. Even though the model is trained well which resulted in low loss, it will not help us and hence these kinds of models perform…
Machine learning models are built on training data and then predictions are done to address the business problems. There are many models (like SVM, Decision tree, Random forest, Logistic regression, Naive bayes ,etc.) which can be built in Machine learning.
Choosing the best model is sometimes challenging as we need to find the right model with optimal parameters. Let us consider applying the SVM model for a particular data set , to optimize this model we have to decide the parameters of kernel, C, gamma, etc. Similarly there will be different parameters for other models which are to be optimized…
An approach to build an efficient model.
Padding in general means a cushioning material. In CNN it refers to the amount of pixels added to an image when it is being processed which allows more accurate analysis. This padding adds some extra space around the image which helps the kernel to improve performance. This is more helpful when used to detect the borders of an image.
When the image is undergoing the process of convolution the kernel is passed according to the stride. While moving, the kernel scans each pixel and in this process it scans few pixels multiple times…
Imagine you are scanning a 16X20 picture and a 2X2 same picture, which one do you think is scanned faster? Yes 2X2 would be faster with less computational power. This downsizing to process fast is called Pooling. Let us see more details about Pooling.
In Deep learning Convolutional neural networks(CNN) is a class which is used to analyze data which depends on the sense of sight like image recognition, video recognition, image classification, medical image analysis etc.
It consists of input layer, hidden layers and output layer. The hidden layers can be multiple convolved layers. These layers convolve the input…
Your data speaks a lot, get to know what it is by using this Machine Learning Library.
The Data Science Lifecycle has many steps from understanding business to visualize the data where each step has to deal with Data. Data has to undergo many phases in a Data science lifecycle where it has to be gathered, cleaned, form hypothesis, select important features, train Machine learning models, evaluate , predict and visualize data. To handle these steps we need certain modules. It is very time consuming to build from the scratch for each process. Hence the existence of libraries.
XG Boost is very powerful Machine learning algorithm which can have higher rates of accuracy when specified by its wide range of parameters in supervised machine learning. XGBoost stands for eXtreme Gradient Boosting. XG Boost works on parallel tree boosting which predicts the target by combining results of multiple weak model. The XGBoost library implements the gradient boosting decision tree algorithm . Let us explore more using an example.
Missing data is a pool of problems in the world of data. Data professionals need complete data to analyze and hence are forced to drop the data which may create loss of valuable data and the inferential power. Thus missing data imputation is more reasonable. And the standard ways of filling with median mode have their own challenges and misrepresentation some times there is a need to explore more on other ways of imputation. Let us see in detail about one way of such imputations. It is called Proximity imputation.
In this procedure the data is imputed using Strawman imputation…
Is your machine learning model taking time and you ever wonder if accuracy is moderate? XGBoost is the solution for you. Let us look at how it can help.
XGBoost stands for e Xtreme Gradient Boosting. XGBoost is a powerful machine learning algorithm in Supervised Learning. XG Boost works on parallel tree boosting which predicts the target by combining results of multiple weak model. It offers great speed and accuracy.
The XGBoost library implements the gradient boosting decision tree algorithm .It is a software library that you can download and install on your machine, then access from a variety of…
Boosting is an ensemble meta-algorithm in machine learning to primarily minimize bias, as well as variance in supervised learning, and a family of machine learning algorithms that transform weak learners to strong ones.
Boosting, initially named hypothesis boosting consists of the idea of filtering or weighting the data that is used to train team of weak learners, so that each new learner gives more weight or is only trained with observations that have been poorly classified by the previous learners.
While processing real time data there will be many rows and columns which are not needed for the analysis, to keep the data set concise we need to drop them from data set.
Drop is one of the main functions used to cleanse data.We can drop specified labels from rows or columns by using drop(), by mentioning corresponding axis,index or column names,level when using multi index labels on different levels.
Syntax for drop function
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
Want to read this story later? Save it in Journal.
Data Scientist and Machine learning Engineer