Defining self-service analytics
Self-service data analytics is essentially a tool that enables users to access and analyze data without relying on IT or system specialists for support. It entails being able to integrate across data sources and easily apply machine learning via a self-service machine learning tool. Most often in biopharma manufacturing today, the end user is the business user who wants information to address a specific issue or assess a KPI. This user could have the role of process development engineer, manufacturing operator, or manufacturing manager, among others.
Currently, data is stored in silos such as building monitoring systems which capture temperature, humidity, pressure, CO2 levels, entry and exit of personnel and such, lab information management systems, and systems that store process data. Until it is integrated, complete data cannot be efficiently processed into information and used for effective modelling. We often help our customers with integration of disparate systems to build a data hub, or data lake, where all this information is fully accessible.
Need for data fusion
Access to the data often needs a data fusion layer to bring it all together in an easy way, contextualize it, align it and even deal with different types of data. The nature of data changes across the different sources, for example, standard time series data compared to spectral data.
Even in terms of what it means to then analyze the data, this could refer to using machine learning models, AI, neural networks, or it could be calculations – a mechanistic relationship you just want an equation for. The software needs to be able to handle different needs and possibly to merge them together.
Contextualizing data for the purpose
Data provides little information without context. To get the most valuable information from a model, the data needs to be contextualized in terms of the big picture – what it is you are trying to learn from the model. For example, maybe you’re looking to determine the ability of the control system to keep up with the demand of the process. You could use a model looking at how much fluctuation there is in this temperature around a set point during a certain phase of a sterilization cycle. You wouldn’t use just the temperature in your model, because that is constantly streaming. You would instead specify that it’s the temperature during a certain part of the process, and not only what the temperature is, but how fast it’s increasing. Models use this type of preprocessed data.
The idea of data collection and fusion can sometimes be complex, as in a model we developed for contamination control, illustrated in figure 1 below.
We needed to look at the cleaning history of the room, as well as the operations and maintenance personnel coming in and out of the room, alongside particular events. Combining all this data, we were able to determine that there is a contamination risk for a certain unit operation when specific activities take place.
Ideally, a self-serve analytics tool would have different models in place and different ways of combining data for those models. It would then have a dashboard to bring relevant information to each user without them having to create the view themselves or request expert input from a data scientist. With a click or a drag, they could see what was happening on a piece of equipment, or why something happens at a particular point in the process. When the user doesn’t have to spend time coding, they can direct their time to more complex tasks or analyses that require their skill sets.
Integrating data and skillsets
There will always be the need for an understanding of the process to develop an accurate model that will meet the end user’s needs. When process experts understand how requested information will be used, they can identify the variables that need to be included. As in the above examples, it’s not a matter of single data points, but data in the context of the process – how different pieces of equipment, people, sensors and such interact – that provides actionable insights. As self-service analytics matures, it provides the potential for not only greater integration of data sources, but of human roles and skill sets.