PDF Google Drive Downloader v1.1


Report a problem

Content text Day-25-Feature-Engineering

Feature Engineering • Univariate Selection • RFE • Tree-based feature selection
Feature Engineering is a process of selecting those features in your data that contribute most to the prediction variable and fabricating new variable by make use existing variables. Having irrelevant features in your data can decrease the accuracy of many models, especially linear algorithms like linear and logistic regression. The objectives of the feature engineering are : • Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise. • Improves Accuracy: Less misleading data means modelling accuracy improves. • Reduces Training Time: Less data means that algorithms train faster.
One common feature selection method that is used with text data is the Chi-Square feature selection. The 2 test is used in statistics to test the independence of two events. More specifically χ in feature selection we use it to test whether the occurrence of a specific term and the occurrence of a specific class are independent. More formally, given a document DD, we estimate the following quantity for each term and rank them by their score: For each feature (term), a corresponding high χ2 score indicates that the null hypothesis H0 of independence (meaning the feature variable has no influence on dependent variable) should be rejected and the occurrence of the feature and class are dependent. High χ2 value suggest, feature is useful in predicting the class variable. Univariate Selection
Univariate Selection Statistical tests can be used to select those features that have the strongest relationship with the output variable. The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features. The method uses the chi-squared (chi2) statistical test for non- negative features to select K number of the best features from given dataset. For regression: f_regression, mutual_info_regression For classification: chi2, f_classif, mutual_info_classif

Related document

x
Report download errors
Report content



Download file quality is faulty:
Full name:
Email:
Comment
If you encounter an error, problem, .. or have any questions during the download process, please leave a comment below. Thank you.