In most cases, I recommend that you don't use drop='first' with OneHotEncoder. Here's why:
1. Multicollinearity is rarely an issue with scikit-learn models
2. drop='first' is incompatible with handle_unknown='ignore'
3. May be problematic if you standardize all features or use a regularized model
Note: Beginning in scikit-learn 1.0, drop='first' and handle_unknown='ignore' can be used together. However, the dropped category and an unknown category will both be encoded as all zeros.
New tips every TUESDAY and THURSDAY!
Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6
Code for all tips: https://github.com/justmarkham/scikit-learn-tips
Get tips via email: https://scikit-learn.tips
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn
2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn
3) LET'S CONNECT!
- Newsletter: https://www.dataschool.io/subscribe/
- Twitter: https://twitter.com/justmarkham
- Facebook: https://www.facebook.com/DataScienceSchool/
- LinkedIn: https://www.linkedin.com/in/justmarkham/