Becoming a data scientist mandates a significant interdisciplinary journey; besides the required training in mathematics, statistics, computer science (e.g., programming, databases etc.) it also requires a basic understanding of the system modeled. It takes years of formal and on the job training to become an expert in this field. Data scientists have been exceedingly employed in manufacturing lines to identify issues and improve the overall manufacturing efficiency. The first step in this process is to extract data from numerous dissimilar sources and transform it into a more digestible format for modeling purposes. Industry leaders have long contended that data wrangling is exceedingly time consuming and unstructured to serve many of the quality needs of customers throughout the manufacturing industry[1],[2]
To relieve the data wrangling burden on the data scientist, we have introduced the SmartFactory Rx Digital Platform that addresses this challenge by providing an easy to use, highly configurable minimal code environment with established connectors to many common data sources, including flat files. This way, Extracting, Transferring, Logging (ETL) and contextualizing data becomes a reduced chore and thus the data scientists can spend their time more productively to monitor, get insights and optimize the process they are responsible for.
The initial stage of ETL and contextualization is just the tip of the iceberg of the data scientist’s burden. Additional challenges that data scientists face in many manufacturing line setups today are typically the following: working with multiple software packages for data gathering and analytics (depending on the data source) and addressing the various connectivity and integration challenges. (Figure 1A). These challenges lead to a constant need for data scientists to rely on IT for support and diminishes the data scientist’s ability to rapidly extract supplementary value out of the data.