Search
Close this search box.

The integration of AI and cloud technologies (Part 1/2)

As AI deployment grows, cloud could pave the way forward
Samantha Duchscherer, Global Product Manager, and Emrah Zarifoglu, Head of R&D for Cloud and AI/ML, are among the automation experts working to bring AI powered technologies to SmartFactory AI solutions. In this two-part series, they share a conversation about the role of cloud in AI training and deployment. Because, in today’s tech landscape, the concept of ‘cloud’ varies widely, they begin with a discussion around what cloud is and the decision-making process for moving to the cloud.
Sam: I want to discuss AI and its integration with cloud technology, but I think it’s essential to first ensure I have a solid understanding of what ‘cloud’ means. How would you define cloud? Is it as straightforward as “a data center filled with countless machines that you can utilize without having to log into each one separately”?

Emrah: To some extent, yes. Being able to scale these resources dynamically and efficiently without needing to log into each machine individually is an important aspect of cloud. However, it is more than just a data center with a lot of machines. ‘Cloud’ is more of a catch-all. It is often a term that is used to encompass many technologies that provide computing power, storage, and a network infrastructure that is scalable and flexible.

Sam: I also understand that there are options like private and public clouds. Could you explain the key differences between them and how a company should decide which one to use?

Emrah: The difference between the two is basically who owns the infrastructure. A third party owns the infrastructure of public cloud and offers services and resources to the customer. There is a lot of standardization of technology that is easier for the less experienced user, so that is a great choice if you don’t have any other concerns about public clouds. Many companies are satisfied with the public cloud option.

However, when it comes to customization or a desire to use different technology, public cloud can present challenges. When you need flexibility, the containerized, moveable and portable technology is better. My recommendation would be to have a private cloud on premises so you can use containerization and Kubernetes to orchestrate your payload and configure your work according to your needs.

As long as you have an effective flow of communication between your cloud and on prem infrastructure, private versus public cloud is really a matter of preference. The choice comes down to whether you want to own the infrastructure and the responsibility that comes with it, or you want to outsource that so you can focus on your main function. The choice is ultimately use-case specific, driven by the factory’s needs and requirements

Sam: What about all the new terms for cloud such as containerization, Docker, Kubernetes, and Helm charts? How do you even begin to explain these?

Emrah: Most of what we understand as cloud now is derived from the cloud revolution of the early 2000s, which introduced virtual machines as representations of physical machines.

The container is a simple, lightweight virtual machine and containerization refers to a method of packaging an application and its dependencies in a single unit. Docker, Kubernetes and Helm charts are all tools of container management.

Docker is the most common platform that enables you to do containerization by automating the process. You’ll see containerization and the use of Docker at the development stage.

Kubernetes and Helm charts, on the other hand, are used during the deployment phase to manage the deployment at scale. Kubernetes is basically an orchestration tool for managing containerized applications and the Helm charts are pre-configured Kubernetes resources.

Sam: I’m now curious if companies have specific thresholds they follow when making decisions about their cloud infrastructure, or if these decisions are more dependent on individual situations and specific circumstances?

Emrah: Cost plays a big role in this. For instance, if you already have the infrastructure – servers with a good lifetime of 5 to 10 years left—you might not want to incur additional cost for the next five years for new technology. You’d only want to move to the cloud if there are obvious advantages from the standpoint of scaling, flexibility or such. However, it’s really based on a particular use case.

If, for example, you are running a very GPU-heavy application, you’ll probably want to be closer to an on-prem solution rather than a cloud solution because running GPUs on the cloud is pretty expensive. If I’m going to be doing this on a continuous basis for an extended period of time, I would probably rather have a GPU data center on my premises or rented in another physical location because it could be half as much as running this on the cloud. Most often, in a case like this, these are AI companies who are training large language models (LLMs) and others. They have their own data centers or rent them to do that training.

Sam: To wrap up our brief yet insightful conversation and to set the stage for the next series of questions, I’m curious about your perspective on where the industry currently stands. Are we closer to the idea that “AI can’t do it alone,” or are we leaning more towards “AI boom is here, but maybe the cloud isn’t ready”?

Emrah: First, we need to define what we mean by the AI boom. In certain areas, such as training large language models, there has definitely been significant progress. However, this progress often relies on scaling up existing technologies like graphics processing units (GPUs), which isn’t necessarily tied to the cloud.

When we talk about revolutionary changes in AI training techniques and methodologies, like the advent of transformers, we’re building on an existing understanding of these models. This has created a high demand for compute resources, which the current supply of GPUs is needed. From this perspective, the cloud isn’t a bottleneck; it’s the underlying technology, like GPUs, that plays a crucial role.

However, if there’s an inflection point where AI no longer relies on GPUs or we can utilize a different type of computing tool, then the cloud might need to adapt to embrace this new technology. For example, at that point, we could leverage all the connected resources globally to improve our training processes.

One area where the cloud might need to catch up is Kubernetes. While Kubernetes works with GPUs, it doesn’t do so as efficiently as we’d like for large-scale AI training. This is why many prefer non-Kubernetes deployments for significant AI training tasks. So, in that sense, the cloud might need to evolve, but in other areas it’s a different story.

Cloud and AI deployment

Cloud infrastructure, whether public or private, is crucial for companies that need to process large volumes of data. This is especially true as the deployment of AI increases the demand for computing power, including GPUs. In our next blog, we’ll look more specifically at the relationship between cloud and AI technologies.

About the Authors

Picture of Samantha Duchscherer, Global Product Manager
Samantha Duchscherer, Global Product Manager

Samantha is the Global Product Manager overseeing SmartFactory AI™ Productivity, Simulation AutoSched® and Simulation AutoMod®. Prior to joining Applied Materials Automation Product Group Samantha was Manager of Industry 4.0 at Bosch, where she also was previously a Data Scientist. She also has experience as a Research Associate for the Geographic Information Science and Technology Group of Oak Ridge National Laboratory. She holds a M.S. in Mathematics from the University of Tennessee, Knoxville, and a B.S. in Mathematics from University of North Georgia, Dahlonega.

Picture of Dr. Emrah Zarifoglu, Head of R&D for Cloud and AI/ML
Dr. Emrah Zarifoglu, Head of R&D for Cloud and AI/ML

Emrah leads the team delivering AI/ML solutions and cloud transformation of APG software for semiconductor manufacturers. He is a pioneer in developing SaaS applications, cloud transformation efforts and building optimization and analytics frameworks for cloud computing. He holds patents in semiconductor manufacturing, cloud analytics and retail science. He also has a well-established research record in planning and scheduling in semiconductor manufacturing. His work is published in IEEE and INFORMS and has been presented in IERC and INFORMS. He earned his Ph.D. in Operations Research and Industrial Engineering from University of Texas at Austin. He holds B.S. and M.S. degrees in Industrial Engineering from Bilkent University, Turkey.

About the Author

Picture of Samantha Duchscherer and Emrah Zarifoglu
Samantha Duchscherer and Emrah Zarifoglu