AI / ML Productivity Supply Chain
Tree based models for the win
Research demonstrates decision trees outperform deep learning with tabular data.

Research demonstrates decision trees outperform deep learning with tabular data.
By AI Productivity StaffIn our recent customer use case, “Achieve accurate lot cycle time predictions for more on-time deliveries,” we noted we determined the most appropriate ML model for the customer’s needs was a gradient boosted tree-based machine learning model, particularly the Light Gradient Boosted Machine implementation. This decision is supported by recent research conducted by Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux at Inria Saclay Centre and Sorbonne University, whose work concluded decision trees outperform deep learning on medium-size tabular data.
Noting that deep learning has “enabled tremendous progress on text and image datasets,”1 researchers stated it had not been proven to be superior at processing these datasets. To compare the performance of the models, they collected 45 tabular datasets, each comprised of more than 3,000 real-world examples. They then trained standard and novel deep learning methods such as vanilla neural network, ResNet, and two Transformer-based models, as well as tree-based models including XGBoost, gradient boosting machines and Random Forests, among others. Each model was trained 400 times, searching randomly through a predefined hyperparameter space.
In assessing the models’ performance, the best tree-based models performed 20 to 30 percent better than the best deep learning models, when averaged across all tasks. They also found neural networks to be much more susceptible to random or less important data features than decision trees. When the authors removed uninformative features, the performance of the two models was more similar. When adding random features to the datasets, the neural networks showed a sharp decline.
The authors concluded, “Results show that tree-based models remain state-of-the-art on medium-sized data (∼10K samples) even without accounting for their superior speed.”
REFERENCE
1. Grinsztajn, L., Oyallon, E., Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? NeurIPS22 Datasets and Benchmarks Track, Nov 22, New Orleans, United States. hal-03723551v2
https://hal.archives-ouvertes.fr/hal-03723551v2
Cookie | Duration | Description |
---|---|---|
_GRECAPTCHA | 5 months 27 days | This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks. |
_icl_visitor_lang_js | 1 day | This cookie is stored by WPML WordPress plugin. The purpose of the cookie is to store the redirected language. |
cookielawinfo-checkbox-analytics | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category . |
cookielawinfo-checkbox-others | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Others". |
wpml_browser_redirect_test | session | This cookie is set by WPML WordPress plugin and is used to test if cookies are enabled on the browser. |
Cookie | Duration | Description |
---|---|---|
_ga | 2 years | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors. |
_ga_VWW0QD4ZN6 | 2 years | This cookie is installed by Google Analytics. |
_gali | 30 seconds | This cookie is associated with Google Analytics. This cookie is used to collect information about how visitors use our site. |
_gat_gtag_UA_202539731_1 | 1 minute | Set by Google to distinguish users. |
_gid | 1 day | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
Cookie | Duration | Description |
---|---|---|
loglevel | Persistent | Maintains settings and outputs when using the Developer Tools Console on current session. |
Bruges, Belgium March 28 – 30, 2023
Application Engineer
January 26-27, 2023 | Dresden, Germany
Hosted by SmartFactory Process Quality team
Oct 13, 2022 in Austin, TX
Key Product Unit Manager, Perceptive Engineering
June 30 – July 1, 2022 | Dresden, Germany
Application Engineer
Application Specialist
June 12-15, 2022 | N. Bethesda, MD, USA
Inspection & Metrology General Manager
Product Solutions Management Leader
Engineering Consultant, Applied Materials | Perceptive Engineering KPU
Head of Commercial, APG Pharma & Process Industries
Principal Engineering Consultant
Managing Director SmartFactory Rx®
Global Strategic Account Manager
Pharma Account Manager
Account Manager, Automation Products
Senior Engineering Consultant
Application Engineer
Head of Business Operations, Automation Product Group Pharma
Global Marketing Head – Pharma and other process industries
Director Technical Marketing
Quality Solutions Architect – APC
Senior Director
Automation Products Group, Sales Europe
April 4-6 in Toulon, France
Global Product Manager
Quality Solutions Architect
Marketing Communications Lead
Director Application Engineer
Director of MES Strategy
© 2023 Applied Materials, All Rights Reserved.