All of the Nvidia lecture notes

Four modules Module 1 Module 2 Module 3 Module 4

Learning objectives for Introduction to AI List examples of the impact of AI across different industries Describe key milestones in the evolution of AI Understand a typical AI workflow and the main steps in each phase Describe at a high-level how neural networks work Identify common challenges enterprises face when adopting AI Articulate the value of NVIDIA end-to-end software stack for deploying AI solutions in production Explain what generative AI is and how the technology works Discuss at a high-level the main concepts of LLMs Describe the steps required for enterprises to unlock new opportunities for their business Explain GPU architecture and its functionality Understand Key differences between CPUs to GPUs Describe GPU Server Systems Understand GPU Virtualization Describe the NVIDIA deep learning software stack and the NVIDIA CUDA-X ecosystem Describe the benefits of the NGC and the Enterprise Catalog Describe the benefits and use cases of NVIDIA AI Enterprise Describe NVIDIA’s AI Workflows Outline the hosting environments for AI workloads, such as data centers and the cloud Enumerate the components constituting AI data centers Indicate the requisites and methods for managing and monitoring AI data centers

Welcome to the Nvidia AI Infrastructure and Operations Fundamentals course. Artificial intelligence, or AI, is transforming society in many ways. From speech recognition to self driving cars to the immense possibilities offered by generative AI. AI technology provides enterprises with the compute power to tools and algorithms. Their teams need to do their life's work. But what is needed to make AI work? What hardware and software technologies lie at its core? Play video starting at ::39 and follow transcript0:39 How can AI performance be optimized to achieve groundbreaking results? These are some of the questions that this course will help you address. By providing an introductory overview of key concepts, participants will gain insights into the underlying components and their pivotal role in transitioning AI from concept to to reality. Let's get started with an outline for this course. This self paced course introduces concepts and terminology that will help learners to start the journey to AI and GPU computing. This course spans four sections and covers Introduction to AI, AI Infrastructure, AI Operations, and lastly the Completion Quiz. Let's go over the course objectives. After completing this course, you should be able express how AI is transforming society. Explain terminology and concepts related to AI. Describe the evolution of GPU computing and its contribution to the AI revolution. Evaluate the Nvidia hardware and software solutions for AI. Identify the data center infrastructure building blocks and their role in building an AI. Recall AI in the data center operation tasks. Let's get started. Play video starting at :1:54 and follow transcript1:54 As previously mentioned, the course is divided into four sections where each contains several units. We start the journey with an introduction to AI where we cover AI basic concepts and principles. Then we delve into data center and cloud infrastructure followed by AI operations. At the end of each unit, you'll encounter a set of check your knowledge questions to reinforce your understanding. The approximate duration required to complete the course is 5.5 hours. Feel free to take breaks and resume the learning at your convenience. We conclude the course with a completion quiz consisting of 25 questions. Youll have 90 minutes to complete the quiz and a passing score of 70 is required. Upon successful completion, youll be entitled to the course completion certificate. Our exploration starts with presenting the basic AI features and principles. And covers AI use cases across different industries the evolution of AI into machine learning, deep learning and generative AI, each phase unlocking new capabilities. The emergence of generative AI applications producing content like music, images, and videos. How GPUs revolutionized AI. The importance of a suitable software stack ensuring optimal performance and efficiency. And lastly, the environments where AI workloads run, data centers or the cloud. Play video starting at :3:18 and follow transcript3:18 Section two, AI infrastructure comprises six units that provide you with the details and considerations you need to architect your data center for AI workloads such as compute platforms, networking, and storage for AI how energy efficient computing practices help data centers lower their carbon footprint and energy use how recommended design documents, called reference architectures can be used as a foundation for building best of breed designs. In the last unit in this section, we move the focus from an on prem data center to cloud based solutions that offer a flexible and accessible alternative. Section 3, AI Operations, is comprised of two units that provide you with details and considerations on how to effectively operate your AI data center. Including infrastructure management and monitoring, cluster orchestration, and job scheduling. Play video starting at :4:12 and follow transcript4:12 Upon completing the learning units, we encourage you to take the course completion quiz designed to assess the knowledge you've gained throughout the course. Successfully passing the quiz will entitle you to receive a certificate of completion. Are you ready? Let's embark on our AI journey. Welcome, this unit marks the beginning of the AI Essentials from Concept to Deployment course. Let's get things started with an overview of this unit. Play video starting at ::18 and follow transcript0:18 In the first unit, we're going to put the spotlight on a few selected industries and learn how they utilize AI in their own unique way. We will cover AI in healthcare, AI in financial services, and AI in autonomous vehicles. Play video starting at ::37 and follow transcript0:37 By the end of this unit, you'll be able to list examples of the impact of AI across different industries. Describe how AI is revolutionizing drug discovery and medical devices. Articulate examples of how AI is transforming industries in the financial sector Illustrate how automakers are using AI from design to self driving cars. Let's get started. Play video starting at :1:12 and follow transcript1:12 I am a translator, Play video starting at :1:18 and follow transcript1:18 transforming text into creative discovery, Play video starting at :1:27 and follow transcript1:27 movement into animation, and direction into action. Play video starting at :1:39 and follow transcript1:39 I am a healer, exploring the building blocks that make us unique. Play video starting at :1:51 and follow transcript1:51 Modeling new threats before they happen, Play video starting at :1:59 and follow transcript1:59 and searching for the cures to keep them at bay. Play video starting at :2:7 and follow transcript2:07 I am a visionary, generating new medical miracles Play video starting at :2:18 and follow transcript2:18 and giving us a new perspective on our sun to keep us safe here on Earth. Play video starting at :2:28 and follow transcript2:28 I am a navigator, discovering a unique moment in a sea of content. We're announcing the next generation and the perfect setting for any story. Play video starting at :2:47 and follow transcript2:47 I am a creator building 3D experiences from snapshots and adding new levels of reality to our virtual selves. Play video starting at :3:5 and follow transcript3:05 I am a helper bringing brainstorms to life, Play video starting at :3:20 and follow transcript3:20 sharing the wisdom of a million programmers, Play video starting at :3:26 and follow transcript3:26 and turning ideas into virtual worlds, build Northern forest. Play video starting at :3:36 and follow transcript3:36 I even helped write the script, Play video starting at :3:43 and follow transcript3:43 breathe life into the words and compose the melody. Play video starting at :3:58 and follow transcript3:58 I am AI brought to life by Nvidia deep learning and brilliant minds everywhere. A quote from Jensen Huang, the CEO of Nvidia, ''We are leveraging the capabilities of AI to perform intuitive tasks on a scale that is quite hard to imagine and no industry can afford to or wants to miss out on the huge advantages that predictive analytics offers.'' AI has the transformative potential to infuse every industry due to its capacity to enhance efficiency, decision making, and innovation. By leveraging advanced algorithms, machine learning, and data analytics, AI can streamline processes, automate tasks, and uncover valuable insights from vast data sets. To name a few examples, in call centers, AI can advance and accelerate many applications, such as AI virtual agents inside extraction and sentiment analysis. In retail, AI can be used to provide insights for store analytics, such as store traffic trends, counts of customers with shopping baskets, aisle occupancy, and more. In manufacturing, AI can help design products, optimize production, improve quality control, reduce waste, and increase safety. AI has seamlessly woven itself into the fabric of our daily lives, leaving an indelible impact on how we work, communicate, and navigate the world. Many familiar technologies found in our day-to-day lives are powered by AI, and many more are being added to an ever evolving landscape of possibilities. The computer industry is currently undergoing transformative shifts that are re-shaping the technological landscape. One notable trend is the explosion of generative AI, which is pushing the boundaries of artificial intelligence applications. Generative AI models are demonstrating unprecedented capabilities in natural language processing, content creation, and problem solving, integrating diverse data sources to create rich and contextually relevant outputs across various sensory modalities such as text, images, and sound. We'll discuss generative AI in greater detail in the third unit of this course, for now, let's shift our focus to our first industry, the healthcare sector. Play video starting at :6:26 and follow transcript6:26 Let's see some real world examples of AI in healthcare. The drug discovery process has traditionally been expensive, long, and laborious. It is a $1.2 trillion industry with approximately $2 billion research and development investment per drug. These drugs normally go through more than 10 years of development, and have a 90% failure rate. By providing rich and diverse datasets that can be leveraged to train and validate AI models, AI is revolutionizing drug discovery by significantly impacting various stages of the drug development process. Digital biology can be used to identify new drug targets, predict the efficacy and toxicity of drug candidates, and optimize the design of new drugs. By using computational methods to model the interactions between drugs and biological systems, researchers can accelerate the drug discovery process and reduce the costs associated with traditional drug development. Lab automation, which is generating vast amounts of data in drug discovery, and the ability to process and analyze this data is becoming increasingly important for drug development. Advanced data analytics and machine learning techniques are being developed to help make sense of this data and accelerate the drug discovery process. AI and computing are revolutionizing drug discovery by enabling researchers to analyze and understand large amounts of data, develop more accurate models, and identify promising new drug candidates more quickly and efficiently. Play video starting at :8:2 and follow transcript8:02 Medical devices are going through a revolution right now, where sensor information is augmented with software, advanced computing, and AI to do amazing things. Nowadays, medical devices can utilize continuous sensing, computation, and AI to detect, measure, predict, and guide high risk precision medical operations. This evolution suggests that the healthcare sector is heading towards a more robotic approach, augmenting clinical care teams to meet demand, maximize efficiency, and enhance access to care. Invenio Imaging is a medical device company developing technology that enables surgeons to evaluate tissue biopsies in the operating room immediately after samples are collected, providing in just three minutes, AI accelerated insights that would otherwise take weeks to obtain from a pathology lab. Invenio uses a cluster of NVIDIA RTX GPUs to train AI neural networks with tens of millions of parameters on pathologist annotated images. These models are layered on top of images, providing real time image analysis, and allowing physicians to quickly determine what cancer cells are present in a biopsy image. With the ability to analyze different molecular subtypes of cancer within a tissue sample, doctors can predict how well a patient will respond to chemotherapy or determine whether a tumor has been successfully removed during surgery. Play video starting at :9:30 and follow transcript9:30 Next, we'll talk about how AI affects financial services. One of AI's incredible breakthroughs is the ability to identify patterns hidden in vast amounts of data at levels of precision, speed, and scale that were previously impossible to reach. This ability brings multiple benefits to the financial services industry from enabling more intelligent trading, to expanding credit and services to underserved people. Play video starting at :10:1 and follow transcript10:01 Let's view a couple of different examples of how large financial institutions are leveraging AI to reduce fraud and improve customer experience. Our first example is the Frankfurt-based Deutsche Bank, a leading global investment bank, is working with NVIDIA to accelerate the use of AI and machine learning on transformational AI applications. Within NVIDIA's Omniverse platform, an open-computing platform for building and operating metaverse applications, Deutsche Bank is exploring how to engage with employees and customers more interactively, improving experiences using 3D virtual avatars to help employees navigate internal systems and respond to HR-related questions. Deutsche Bank and NVIDIA are testing a collection of large language models targeted at financial data called financial transformers or FinFormers. These systems will achieve outcomes such as early warning signs on the counterparty of a financial transaction, faster data retrieval, and identifying data quality issues. The second example is American Express. Credit and bank cards are a major target for fraud. To thwart fraudulent activity, American Express, which handles more than eight billion transactions a year, leverages deep Play video starting at :11:32 and follow transcript11:32 learning through the NVIDIA GPU. Let's move on to our final sector. Let's learn how AI is being used in autonomous vehicles. The impact of artificial intelligence in the automotive industry has been transformative, revolutionizing the way vehicles are designed, manufactured, and operated. Let's look at a few examples that illustrate this transformation. Design visualization. Photorealistic rendering empowers designers to take advantage of immersive, real-time, physically accurate visualizations. Engineering simulation. Engineers and simulation experts can rapidly analyze and solve complex problems with GPU-accelerated hardware and software tools. Industrial digital twin. With NVIDIA Omniverse enterprise, automakers can quickly develop and operate complex AI-enabled digital twins, maximize productivity, and help maintain faultless operation. Virtual show rooms and car configurators. As the buying experience migrates from physical retail spaces to online, dealerships can offer photoreal, interactive content for personal experiences. Intelligent assistance. With conversational AI, natural language understanding and recommendation engines, intelligent in-vehicle services can act as every passenger's digital assistant. Autonomous driving and parking. Autonomous vehicles are transforming the way we live, work, and play, creating safer and more efficient roads. These revolutionary benefits require massive computational horsepower and large-scale production software expertise. Tapping into decades-long experience in high-performance computing, imaging, and AI, NVIDIA has built a software-defined end-to-end platform for the transportation industry that enables continuous improvement and deployment through over-the-air updates. Play video starting at :13:52 and follow transcript13:52 Welcome, Daniel. I see a text from Hubert asking can you pick me up from the San Jose Civic? Should I take you there? Yes, please. Taking you to San Jose Civic. Start Drive Pilot. Starting Drive Pilot. Can you let Hubert know we're on our way? Sure. I'll send him a text. Play video starting at :14:50 and follow transcript14:50 I see Hubert. Play video starting at :14:56 and follow transcript14:56 Can you please take me to Rivermark Hotel? Taking you to Rivermark Hotel. Thanks for picking me up. Definitely. Start drive pilot. Starting drive pilot. Play video starting at :15:11 and follow transcript15:11 What building is that there? That building is San Jose Center for the Performing Arts. What shows are playing there? Katz is playing tonight. Can you get me two tickets for Saturday night? Yes, I can. Play video starting at :15:50 and follow transcript15:50 You have arrived at your destination. Please park the vehicle. Finding a parking spot. Play video starting at :16:23 and follow transcript16:23 Vehicle production is a colossal undertaking, requiring thousands of parts and workers moving in harmony. Any supply chain or production issues can lead to costly delays. Additionally, when automakers roll out a new model, they must reconfigure the layout of production plants to account for the new vehicle design. This process can take significant portions of the factory offline pausing manufacturing for existing vehicles. Mercedes Benz is digitizing its production process using the NVIDIA Omniverse platform to design and plan manufacturing and assembly facilities. NVIDIA Omniverse is an open 3D development platform enabling enterprises and institutions across all industries to build and operate digital twins for industrial and scientific use cases. By tapping into NVIDIA AI, and metaverse technologies, the automaker can create feedback loops to reduce waste, decrease energy consumption, and continuously enhance quality. Play video starting at :17:27 and follow transcript17:27 NVIDIA Omniverse is a scalable end to end platform, enabling all industries to build and operate digital twins for scientific research, infrastructure, product design, architecture, and more. Now, Mercedes Benz is using Omniverse to optimize new production and assembly facilities. Building a car requires thousands of parts and workers, all moving in harmony. Using digital twins created in Omniverse, an assembly line for a new model can be reconfigured in simulation without interrupting current production. Production planners can synchronize plants around the world, enabling over the air software updates to manufacturing equipment, streamlining operations while improving quality and efficiency. Mercedes Benz is preparing to manufacture its new EV platform at its plant in Rastatt, Germany. Operations experts are simulating new production processes in Omniverse, which can be used alongside existing vehicle production. This virtual workflow also allows the automaker to quickly react to supply chain disruptions, re-configuring the assembly line as needed. Using NVIDIA AI and Omniverse, Mercedes Benz is building intelligent, sustainable factories that improve efficiency, reduce waste, and continually enhance vehicle quality. Play video starting at :18:50 and follow transcript18:50 AI is powering change in every industry from speech recognition and recommenders to medical imaging and improved supply chain management. AI is providing enterprises the compute power, tools and algorithms their teams need to do their life's work. From the Cloud to the office to the data center to the edge, AI powered solutions revolutionize enterprise operations by enhancing efficiency, automating complex tasks, optimizing decision making processes, and unlocking valuable insights from vast datasets. In the following units of this course, you'll learn about the components of an end to end AI accelerated computing platform from hardware to software, forming the blueprint to a robust, secure infrastructure that supports, developed to deploy implementations across all modern workloads. Thank you for your time and attention. We'll see you in the next unit where we'll go over a thorough introduction to artificial intelligence. Welcome to the Introduction to Artificial Intelligence Unit. Today we'll delve into the foundations of AI. We'll unravel the basics and provide you with a solid understanding of artificial intelligence, laying the groundwork for your journey into this exciting field. In this unit, we cover an introduction to AI and its evolution through the years, typical steps of an AI workflow, a brief explanation of how deep learning works, ML and DL features and comparison, and challenges when deploying AI in production. By the end of this unit, you'll be able to describe key milestones in the evolution of AI, visualize a typical AI workflow and the main steps in each phase, describe in high level how neural networks work, identify common challenges enterprises face when adopting AI, and articulate the value of NVIDIA end-to-end software stack for deploying AI solutions in production. Let's get started. AI is a broad field of study focused on using computers to do things that require human level intelligence. It has been around since the 1950s, used in games like Tic-Tac-Toe and Checkers, and inspiring scary sci-fi movies, but it was limited in practical applications. Machine learning, or ML, came in the '80s as an approach to AI that uses statistics techniques to construct a model from observed data. It generally relies on human-defined classifiers or feature extractors that can be as simple as a linear regression or the slightly more complicated bag of words analysis technique that made email spam filters possible. This was very handy in the late 1980s when email spam started becoming an issue for many users. With the invention of smartphones, webcams, and social media services, and all kinds of sensors that generate huge mountains of data, a new challenge presented itself, that of understanding and extracting insights from all this big data. Real breakthroughs with deep learning, or DL, came around 2010, largely due to advances in hardware, the availability of large datasets, and improvements in training algorithms, which automated the creation of feature extractors using large amounts of data to train complex deep neural networks, or DNNs. Within only one decade from the advancements brought by DNNs, we are now in a new era of generative AI and large language models with systems that are surprisingly human-like in their intelligence and capabilities. Applications such as chat bots, virtual assistance, content generation, translation services, and more are impacting industries and our daily lives. We will continue the discussion on generative AI in the next unit. Play video starting at :3:8 and follow transcript3:08 An AI workflow, often referred to as a machine learning workflow or data science workflow, is a sequence of tasks and processes that data scientists, machine learning engineers, and AI practitioners follow to develop, train, deploy, and maintain artificial intelligence models. These workflows help to ensure that AI projects are systematic, well-documented, and effective. Let's consider a typical AI workflow broken down into four fundamental steps. Play video starting at :3:43 and follow transcript3:43 The first step is data preparation, which involves collecting, cleaning, and preprocessing raw data to make it suitable for training and evaluating artificial intelligence models. The size of a dataset used in model training can vary from small to very large datasets with billions of parameters. Ultimately, the quality, diversity, and relevance of the data are as important as the dataset size. Once the data has been prepared accordingly, it is fed into a model. The model training step of an AI workflow involves using a machine learning or deep learning model to learn patterns and relationships within a labeled data set. The model is trained over a set of data by using a mathematical algorithm to process and learn from the data. This is a critical part of the AI workflow. It involves teaching the AI model to recognize patterns and make predictions. Data scientists can iterate many times before producing a model that's ready for deployment. Model optimization is a crucial step in an AI workflow. It involves fine-tuning and enhancing the performance of the AI model to make it more accurate, efficient, and suitable for its intended use case. It's an iterative process where adjustments are made based on evaluation results and the model is fine-tuned until it meets the desired performance criteria for deployment. Once you've trained the model, it's ready to deploy into inference. It involves using a trained machine learning or deep learning model to make predictions, decisions, or generate outputs based on new unseen data. This step typically occurs after the model has been trained and validated, and is ready to be deployed in a real world or production environment. Inference is often the core of AI applications, where the model's ability to provide meaningful and accurate outputs is essential for addressing real world challenges and achieving the application's objectives. Play video starting at :5:49 and follow transcript5:49 Let's see a typical AI workflow example for deploying an image recognition solution alongside the tools that can be used in each step. Image is a radiology clinic that provides services such as MRIs, X-rays, and CT scans to several doctor offices. They want to enhance their services by adding image recognition of fractures and tumors, helping doctors and their diagnostics. Sarah, an ML engineer, gathers historical data sets containing X-rays, CT scans, and MRIs from hospital research institutes and their own inventory. For the data preparation step, she uses Rapids, an open source suite of GPU accelerated Python libraries built on Invidia AI to perform analytics and to prepare data for machine learning. She leverages the Rapids Accelerator for a patchy Spark, a plug in software that automatically intercepts and accelerates operations that can be sped up with rapid software and GPUs while allowing other operations to continue running on the CPU. Once the data prep is complete, Pytorch and Tensor Flow are the GPU accelerated computational frameworks that can be used to train the model at scale. They are now integrated with Nvidia Rapids to simplify enterprise AI development. Once the model training is complete, it can be optimized using Nvidia Tensor RT, a deep learning inference optimizer to fine tune and improve the model's performance, making it ready to be deployed, executed, and scaled. Lastly, AI Inference applies logical rules to the knowledge base to evaluate and analyze new information. She uses Invidia Triton Inference Server as an open source software that standardizes AI model deployment execution and takes care of all IT and Dev Ops deployment aspects such as load balancing. Play video starting at :7:48 and follow transcript7:48 We've discussed typical AI workflow steps and provided an example. Now, let's shift our attention to the intricacies of deep learning. Play video starting at :7:59 and follow transcript7:59 Jeffrey Hinton, the godfather of deep learning, and AI once said, I have always been convinced that the only way to get artificial intelligence to work is to do the computation in a way similar to the human brain. That is the goal I have been pursuing. We are making progress. Though we still have lots to learn about how the brain actually works. Let's get a better understanding of what exactly Jeffrey meant. Play video starting at :8:26 and follow transcript8:26 Earlier, we mentioned that deep learning harnesses deep neural networks, or DNNs, to achieve levels of accuracy that rival human capabilities. But have you ever wondered how these neural networks in AI are inspired by the human brain? Well, it all begins with a bit of neuroscience. Let's compare the works of a biological neuron to an artificial neuron. Artificial neural networks take a page from the human brains playbook. Picture this. In our brain, there are tiny components called neurons. Neurons are like tiny information messengers, they communicate through a series of events. First, dendrites, which act as the receiving antennas of neurons, pick up signals from neighboring neurons, terminal buttons. These signals are then sent to the cell nucleus for some processing magic. After that, the electrical impulse zips along a longer branch called the axon, making its way to the synapse. The synapse acts as a bridge passing the impulse to the dendrites of another neuron. It's like a relay race of information, creating a complex neural network in the human brain. As you can tell from the onscreen animation, artificial neurons are fundamentally inspired by the workings of biological neurons. Play video starting at :9:47 and follow transcript9:47 What is the deep learning workflow? Consider an application that automatically identifies various types of animals, in other words, a classification task. The first step is to assemble a collection of representative examples to be used as a training dataset, which will serve as the experience from which a neural network will learn. As we just learned, neural networks are algorithms that draw inspiration from the human brain in understanding complex patterns. If the classification is only cats versus dogs, then only cat and dog images are needed in the training dataset. In this case, several thousand images will be needed, each with a label indicating whether it is a cat image or a dog image. To ensure the training dataset is representative of all the pictures of cats and dogs that exist in the world, it must include a wide range of species poses and environments in which dogs and cats may be observed. Play video starting at :10:46 and follow transcript10:46 The next component that is needed is a deep neural network model. Typically, this will be an untrained neural network designed to perform a general task like detection, classification, or segmentation on a specific type of input data like images, text, audio, or video. Shown here is a simple model of an untrained neural network. At the top of the model, there is a row or layer that has five input nodes. At the bottom, there is a layer that has two output nodes. Between the input layer and the output layer are a few hidden layers with several nodes each. The interconnecting lines show which nodes in the input layer share their results with nodes in the first hidden layer, and so on, all the way down to the output layer. Nodes may be referenced as artificial neurons or preceptrons since their simple behavior is inspired by the neurons in the human brain. A typical deep neural network model would have many hidden layers between the input layer and the output layer, which is why it is called deep. We use a simplified representation on this slide for brevity. Play video starting at :11:59 and follow transcript11:59 The design of the neural network model is what makes it suitable for a particular task. For example, image classification models are very different from speech recognition models. The differences can include the number of layers, the number of nodes in each layer, the algorithms performed in each node, and the connections between the nodes. There are readily available deep neural network models for image classification, object recognition, image segmentation, and several other tasks. But it is often necessary to modify these models to achieve high levels of accuracy for a particular dataset. For the image classification task, to distinguish images of cats versus dogs, a convolutional neural network such as AlexNet would probably be used. AlexNet is comprised of nodes that implement simple generalized algorithms. Using these simple generalized algorithms is a key difference and advantage for deep learning versus earlier approaches to machine learning, which required many custom data specific feature extraction algorithms to be developed by specialists for each dataset and task. Play video starting at :13:10 and follow transcript13:10 Once a training dataset has been assembled and a neural network model selected, a deep learning framework is used to feed the training data set through the neural network. For each image that is processed through the neural network, each node in the output layer reports a number that indicates how confident it is that the image is a dog or a cat. In this case, there are only two options so the model needs just two nodes in the output layer, one for dogs and one for cats. When these final outputs are sorted in a most confident to least confident manner, the result is called a confidence vector. The deep learning framework then looks at the label for the image to determine whether the neural network guessed or inferred the correct answer. If it inferred correctly, the framework strengthened the weights of the connections that contributed to getting the correct answer, and vice versa. If the neural network inferred the incorrect result, the framework reduces the weights of the connections that contributed to getting the wrong answer. After processing the entire training dataset once, the neural network will generally have enough experience to infer the correct answer a little more than half of the time, slightly better than a random cointus. It'll require several additional rounds to achieve higher levels of accuracy. Play video starting at :14:31 and follow transcript14:31 Now that the model has been trained on a large representative dataset, it has become better at distinguishing between cats and dogs. But if it were shown a picture of a raccoon, it would likely assign comparable confidence scores to both the dog and cat, as it wouldn't be certain about identifying either one. If it was necessary to classify raccoons as well as dogs and cats, the design topology of the model would need to be modified to add a third node to the output layer. The training dataset would be expanded to include thousands of representative images of raccoons and use the deep learning framework to retrain the model. Play video starting at :15:13 and follow transcript15:13 Once the model has been trained, much of the generalized flexibility that was necessary during the training process is no longer needed, so it is possible to optimize the model for significantly faster run time performance. Common optimizations include fusing layers to reduce memory and communication overhead, pruning nodes that do not contribute significantly to the results and other techniques. The fully trained and optimized model is then ready to be integrated into an application that will feed it new data. In this case, images of cats and dogs that it hasn't seen before. As a result, it will be able to quickly and accurately infer the correct answer based on its training. Let's summarize the key differences in the realm of AI we've covered till now. When most technology companies talk about doing AI, they're talking about using machines to mimic human abilities, to learn, analyze, and predict. Machine learning achieves that by using large datasets and sophisticated statistical methods to train a model to predict outcomes from new incoming information. One of the most popular machine learning techniques used these days is deep learning. Deep learning uses artificial neural networks to learn from vast amounts of data to solve AI problems and really shines for use cases involving vision and speech. Generative AI is a type of artificial intelligence that uses machine learning algorithms to learn patterns and trends from the training data using neural networks to create new content that mimics human generated content. Play video starting at :16:51 and follow transcript16:51 Now that we've gained a deeper understanding of AI and the intricacies of deep learning workflows, let's turn our attention to the individuals and organizations who are actively employing AI. We'll also explore the challenges they encounter while trying to harness the capabilities of AI. In an AI data center, various stakeholders play crucial roles in the planning, development, and operation of AI infrastructure and applications. AI practitioners create applications that extract meaningful data from large data sets using machine learning. They desire an Agile Cloud native platform that can quickly evolve to support the latest in AI with accuracy and performance that can accelerate time to deployment. For enterprise IT who manages the company's infrastructure, data, and application life cycle, AI is still an emerging workload and rapidly evolving, managing rapidly changing applications and often open source platforms and tools can be a challenge for an IT department that is most likely also dealing with older infrastructure and technical debt. For example, they want to ensure the data centers infrastructure meets AI workloads demands with an optimized platform to bring AI into production. Line of business managers want to see more models deployed in production sooner by ensuring the efficient utilization of the investments in infrastructure and platforms. Leaders are constantly looking for the quickest way to excel return on investment and provide data driven results from their investment in AI infrastructure. Play video starting at :18:28 and follow transcript18:28 The benefits of AI are massive, but fully realizing those benefits requires a comprehensive solution. Along with many of these benefits come certain challenges which are essential to consider when adopting AI. Exploding model sizes and complexity. State of the art AI models continue to rapidly evolve and expand in size, complexity, and diversity. The rapid growth of AI models demands extensive computational resources and energy, potentially limiting affordability and sustainability and posing accessibility challenges for smaller organizations. The versatility required to deliver rich experience AI enabled applications like product recommendations, voice assistance, and contact center automation may require multiple different powerful models to be deployed within the same application in order to deliver a fantastic user experience. Performance and scalability. Training these AI models and customizing them for your unique application is an intense, complex iterative process. End to end performance considering both the individual steps and each overall iteration is critical for accelerating toward a solution. Taking AI to production requires tools to support the end to end AI life cycle, compute infrastructure, and a robust support model to ensure all key stakeholders, the data scientists, engineers, developers, and operators are able to meet their unique goals. Play video starting at :20:3 and follow transcript20:03 Nvidia contributes to addressing these challenges by providing AI practitioners with top tier development tools, frameworks, and pre-trained models. Additionally, the platform offers reliable management and orchestration solutions for IT professionals, guaranteeing performance, high availability, and security. The Nvidia AI software stack enables the full AI pipeline from data prep, from model training through inferencing, and ultimately scaling. It accelerates time to production with AI workflows and pre-trained models for specific business outcomes such as intelligent virtual assistance, digital fingerprinting for real time cybersecurity threat detection, and recommender systems for online retail. Finally, your AI solution is optimized and certified to deploy everywhere from public cloud to data centers to the edge devices. This provides the flexibility and reduces the risk of moving from pilot to production caused by infrastructure and architectural differences between environments. Play video starting at :21:8 and follow transcript21:08 Well done for making it to the end of this unit. Now you should be able to describe key milestones in the evolution of AI, visualize a typical AI workflow and the main steps in each phase, describe in high level how neural networks work, identify common challenges enterprises face when adopting AI, articulate value of Nvidia end to end software stack for deploying AI solutions in production. Continue the journey by taking the next unit, Generative AI Overview. Welcome back to the AI Essentials from concept to deployment course. We are now entering the third unit which provides an overview of generative AI. Play video starting at ::18 and follow transcript0:18 Here's the outline for this unit. Unit 3 is aimed at understanding what generative AI is, how the technology works, an overview of large language models or LLMs, and the steps required to deploy generative AI solutions in the enterprise. Play video starting at ::36 and follow transcript0:36 By the end of this unit, you'll be able to explain what generative AI is and how the technology works. Discuss at a high level, the main concepts of large language models and describe the steps required for enterprises to unlock new opportunities for their business. Play video starting at ::54 and follow transcript0:54 In the previous unit, we learned about the AI process that allows us to reach generative AI, but let's get a better understanding of exactly what it is. Generative AI refers to a subset of artificial intelligence that focuses on creating data or content, such as images, text, and multimedia, based on patterns and examples from a given dataset. What sets generative AI apart is its versatility, enabling it to perform a wide array of tasks beyond text and chat based applications. These systems can generate diverse forms of content, including realistic images, videos, music, and even entire virtual environments by learning from and synthesizing patterns in the input data. Its adaptability to diverse data types and applications underscores its potential to transform multiple industries and augment human capabilities across a wide spectrum of tasks. Play video starting at :1:50 and follow transcript1:50 Generative AI is making inroads into every industry, transforming traditional practices, and bringing forth unprecedented operational efficiency and innovation. In finance, it enhances fraud detection, personalized banking, and provides valuable investment insights. Within healthcare, it powers molecule simulation, drug discovery, and clinical trial data analysis. Retail benefits from personalized shopping, automated catalog descriptions, and automatic price optimization. In manufacturing, it transforms factory simulation, product design, and predictive maintenance. These are just some examples of the innovations brought by generative AI. Play video starting at :2:37 and follow transcript2:37 Foundational models serve as the basis or foundation for the creation and evolution of generative AI systems, providing the initial framework for understanding complex language structures, semantics, and contextual nuances. They consist of AI neural networks trained on massive, unlabeled datasets to handle a wide variety of jobs such as generating text images, summarizing documents, translating languages and more. One of the breakthroughs with generative AI models is the ability to leverage different learning approaches, including unsupervised or semi supervised learning for training. That is, it relies on finding patterns and structures in the data on its own without requiring labeled data to train a model. Play video starting at :3:26 and follow transcript3:26 DALL.E creates realistic images from text descriptions. It can be used for image synthesis tasks such as image captioning, image editing, or image manipulation. eDIFF-I is a diffusion model for synthesizing images given text which generate photorealistic images corresponding to any input text prompt. LLAMA 2 can be used for generating diverse and high quality natural language text, making it valuable for various tasks such as content creation, language understanding, and conversational AI applications. NVIDIA GPT is a family of production ready large language models, or LLMs, that can be tuned to build enterprise generative AI applications that can perform a range of tasks from creating product descriptions, answering customer queries, and writing code. GPT 4 gives applications the ability to create human like text and content images, music and more, and answer questions in a conversational manner. We just saw examples of foundation models, let's now discuss how they are trained. The Large Language Models or LLMs powering the advances in generative AI are a significant turning point. They've not only cracked the code on language complexity, enabling machines to learn context, infer intent, and be independently creative, but they can also be fine tuned for a wide range of different tasks. A foundation model is trained on a large amount of unlabeled data. That is, data that does not have any predefined categories, labels, or annotations such as raw text, images, audio or video. Unlabeled data is abundant and diverse and can be obtained from various sources such as the Internet, social media platforms, or proprietary data sets. A foundation model trained on text data can be used to solve problems related to natural language processing, such as question answering, information extraction, et cetera. The possibilities for what a foundation model can generate are endless and depend on the creativity and ingenuity of the users who apply them to different problems and domains. Large language models utilize a specialized neural network known as the transformer, to grasp patterns and relationships within textual data. They undergo pre training on extensive text data sets and can be fine tuned for specific tasks. The goal of the language model is given the preceding words in a context to predict the next word. While this example pertains to the English language, the prediction could apply to a computer programming language or another language. The model generates text one word at a time based on an input prompt provided by the user. In this case, the input prompt is, write a review about an Italian restaurant I visited and enjoyed. The input prompt is broken down into smaller tokens that are then fed back into the model. The model then predicts the next word in the sequence based on the tokens it has received. This process continues until the user stops providing input, or the model reaches a predetermined stopping point. LLMs are constructed based on tokens which represent the smallest units of meaning in a language. Tokens encompass words, characters, sub words, or other symbols representing linguistic elements. The transformer model architecture empowers the LLM to comprehend and recognize relationships and connections between tokens and concepts using a self attention mechanism. This mechanism assigns a score, commonly referred to as a weight, to a given item or token to determine the relationship. Play video starting at :7:11 and follow transcript7:11 Generative AI models often involve complex mathematical operations and require intensive computations. GPUs are designed to be highly effective for parallel processing. This parallelism enables faster training and inference times for generative AI models compared to using traditional CPUs. GPUs, Excel in parallel processing, matrix operations, memory capacity and memory bandwidth, making them an ideal choice for powering up generative AI. They significantly accelerate the training and inference processes, enable working with large scale models and support real time applications. For example, ChatGPT was trained on 10,000 in video GPUs for weeks. While generative AI offers significant benefits, including increased efficiency and cost savings, its adoption does not come without challenges. To successfully implement such solutions, you'll need to address various technical, ethical, and regulatory issues. It's our responsibility to build using guardrails or rules to mitigate inappropriate outcomes. Data privacy and security. Generative AI use cases in the healthcare and financial sectors should be monitored very closely to forestall any money related or sensitive data leakages, IP rights and copyright. Generative AI platforms should mitigate copyright infringement of the creator's work. Bias errors and limitations. Generative AI is just as prone to biases as humans are, because in many ways, it is trained on our own biases. Ethical implications. Determining responsibility for the outputs of generative AI can be challenging. If AI systems generate harmful content, it may be unclear who bears responsibility, the developers, the users, or the technology itself. Malevolent activities. There is no state of the art know how that wrongdoers can't put to their evil uses, and generative AI is not an exception where fraudulent scams of various kinds can be created. Play video starting at :9:20 and follow transcript9:20 Having gained a deeper understanding of generative AI, let's now explore its practical applications in the enterprise. Generative AI is a powerful branch of artificial intelligence that holds immense potential for addressing various challenges faced by enterprises. While many customers have previously explored AI using traditional classification named entity recognition or NER, and natural language processing or NLP tasks. There's a gradual migration towards large language models for these familiar tasks. As organizations become more acquainted with LLMs, they're increasingly recognizing the value of generative AI and expanding its application across various workloads. In this slide, we'll explore key scenarios where generative AI can be leveraged to effectively solve enterprise challenges. Generative AI produces new content based on patterns and trends learned from training data. Traditional AI, on the other hand, focuses on detecting patterns, making decisions, honing analytics, classifying data, and detecting fraud. Generative AI and traditional AI are not mutually exclusive but complementary. While generative AI presents exciting opportunities for enterprises to overcome challenges and unlock new possibilities, traditional AI models are still able to address many use cases and tend to be less compute intensive. Play video starting at :10:43 and follow transcript10:43 This chart shows the customization spectrum that different enterprise customers would need based on their generative AI use cases. The largest category on the left pertains to generative AI as a service, likely the one familiar to many through experiences with ChatGPT. Here you input a prompt and receive a corresponding response. If the response is unsatisfactory, you can modify the prompt to obtain a new one. While the model remains constant, it receives input refinement or reinforcement to assist in delivering a more accurate answer. The next level, moderate customization, is where enterprises will add additional parameters to the existing pre-trade model and slightly tune it. This moderate customization still requires infrastructure, expertise, and investment. The next one is extensive customization. Here's where customers are building their own foundational models. They require extensive fine tuning and even more investment in infrastructure and AI expertise. Play video starting at :11:44 and follow transcript11:44 Building foundation models from scratch requires a lot of data and compute resources making it a very costly process. It's often large technology companies, research institutions, and well funded start ups that have the resources and expertise to undertake such projects. Customizing a pre-trained model, on the other hand, requires less data resources and expertise. It involves feeding the model with domain specific data tasks and adjusting the model parameters accordingly. It's less resource intensive, but also requires some knowledge of the model capabilities, the data format, and the evaluation metrics. As a result, many organizations choose the more cost effective approach of leveraging existing foundational models like those offered by AI research companies and fine tune them to their specific needs. Play video starting at :12:34 and follow transcript12:34 While generative AI shows tremendous promise for delivering business value, companies must also make substantial investments to build custom LLMs to meet their needs. Building custom LLMs requires massive amounts of training data. To get these models to understand, predict, and generate human like text, we need to feed them with a substantial corpus of diverse and high quality data. This presents a challenge in terms of not only data collection, but also its curation, storage, and management. The sheer scale of computations required for training these models is immense. It demands a robust large scale computing infrastructure which is expensive. Implementing LLMs requires more than just the right hardware. It also necessitates the right software. Organizations need tools that address both training and inference challenges, from algorithm development to accelerating inference on a distributed infrastructure. LLMs are complex and sophisticated. They require a deep understanding of AI, machine learning, and data science principles, building and fine tuning these models require teams with a high degree of technical expertise in these areas which can be difficult to find and retain. Play video starting at :13:47 and follow transcript13:47 Embarking on the journey of generative AI involves some key steps. In essence, deploying generative AI in an organization is not just about the technology, but also about aligning it with business goals, team capabilities, data strategies, infrastructure readiness, and a commitment to responsible AI. Identify the business opportunity. We must focus on use cases with substantial business impact and ones that can be enhanced by our unique data. These opportunities form the bedrock of our generative AI strategy. Build out domain and AI teams. This involves identifying our internal resources and coupling them with AI expertise from partners and application providers forming an interdisciplinary team that understands both our business and the AI landscape. Analyze data for training and customization. This is where we acquire, refine, and protect our data in order to build data intensive foundation models or customize existing ones. Invest in accelerated infrastructure. This includes assessing our current infrastructure, architecture, and operating model while carefully considering associated costs and energy consumption. The right infrastructure will enable an efficient and effective deployment of our AI solutions. Develop a plan for responsible AI. This means leveraging tools and best practices to ensure that our AI models and applications uphold ethical standards and operate responsibly. In essence, deploying generative AI in an organization is not just about the technology, but also about aligning it with business goals, team capabilities, data strategies, infrastructure readiness, and a commitment to responsible AI. Play video starting at :15:36 and follow transcript15:36 Let's look at some of the steps involved in building generative AI applications, from data preparation to deploying an LLM into production. The data acquisition phase involves collecting and preparing the data that'll be used to train and fine tune the LLM. The data can come from various sources such as public datasets, web scraping, user generated content, or proprietary data. It's important that the data is diverse and representative of the target domain. Once there's enough data gathered, comes data curation. This phase involves cleaning, filtering, and organizing the data that will be used to train and fine tune the LLM. The pre-training phase of an LLM involves exposing the model to a vast corpus of text data to facilitate the learning of language patterns, relationships, and representations. This phase typically incorporates a foundational model as the starting point. Customization allows the adaptation of a generic model to the specific requirements of a given task or domain, thereby improving its accuracy, efficiency, and effectiveness. Model evaluation is the process of assessing the performance and effectiveness of a machine learning model. It involves measuring how well the model is learned from the training data and how accurately it can make predictions on unseen or new data. After a model has been trained on a dataset, it is deployed for inference, where it processes input data and produces output such as classifications, predictions, or recommendations, depending on the specific task it was trained for. Adding guardrails to an LLM is crucial for fostering responsible AI practices and mitigating the risks associated with the misuse or misinterpretation of the generated text. It helps ensure ethical, safe, and responsible use of the model. Play video starting at :17:30 and follow transcript17:30 The NVIDIA generative AI platform is built on robust hardware, versatile software, and high quality enterprise grade support. This combination allows NVIDIA to provide a fully production ready generative AI solution to empower enterprises to develop custom, large language models for diverse applications such as language processing, multimodal use cases, healthcare, life sciences, and visual content creation. At the pinnacle of this platform, NVIDIA has developed AI Foundations, a set of tools and services designed to advance enterprise level generative AI. These tools allow for customization across various use cases, from text based applications through NVIDIA NeMo, visual content creation with NVIDIA Picasso, and biological computations using NVIDIA BioNeMo. These services are layered atop the NVIDIA AI enterprise, a software suite that streamlines the development and deployment of generative AI, computer vision and speech AI, allowing organizations to focus more on extracting valuable insights and less on maintenance and tuning. At the base of this technological pyramid lies NVIDIA's Accelerated Compute Infrastructure which is flexible and versatile. It can operate anywhere, be it on Cloud platforms or on-premises. Well done on completing this unit. Now that you've finished this unit, you should be able to explain what generative AI is and how the technology works, discuss the generative AI market trends and the challenges in this space with your customers, describe the steps required for enterprises to unlock new opportunities for their business. You've reached the end of the third unit in the AI Essentials from Concept to Deployment course. In the next unit, we'll explore the acceleration of AI through GPUs. Thank you for dedicating your time and attention. Welcome to the accelerating AI using NVIDIA GPUs unit. This unit introduces you to NVIDIA GPUs, the engines for accelerated compute. Let's jump over to the agenda to see what we have in store in this unit. In this unit, we cover a historical perspective on GPUs, looking at why GPUs were developed in the first place and how they've evolved to become a key component for accelerating computing in the modern data center. A deep dive into general GPU architecture, a head to head comparison between CPUs and GPUs. An overview of GPU server systems within the data center. Lastly, an introduction into the last three generations of NVIDIA GPU architecture. Let's examine the learning objectives for this unit. After completing this unit, you should be able to, recall significant milestones in GPU history and describe key developments in the evolution of GPU technology. Explain the core components of GPU architecture and their functions, demonstrating a clear understanding of how GPUs work. Analyze and compare the architectural differences between CPUs and GPUs, highlighting their strengths and limitations in various computing scenarios. Apply knowledge of GPU server systems to plan, configure, and deploy GPUs effectively, taking hardware configurations and deployment strategies into consideration. Evaluate NVIDIA's AI GPU families, assess their features, and determine which GPU family best suits specific AI and deep learning use cases based on their capabilities and characteristics. We have a lot to cover, so let's get started. Delving into the rich and ever evolving history of GPUs, we uncover the milestones and technological transformations that have shaped the world of graphics processing. A graphics processing unit, or GPU, is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Computer graphics have undergone significant evolution since the 1970s. Their importance cannot be overstated, as humans primarily rely on vision to process the information presented by computers. Images on screens are comprised of picture elements, or pixels. A pixel is the smallest unit of a digital image that can be displayed and represented on a digital display device. Pixel characteristics include position, color, and brightness. Every pixel in an image must be processed at such a rate that the human eye does not perceive any delays or inconsistencies as it absorbs the image being presented. As display and computer technology have advanced, screens now have more pixels, leading to more realistic image representation. This is referred to as screen resolution, which represents pixel density. The processing behind the pixels is done by the GPU. As screen resolutions have increased, the processing power necessary to represent each pixel has also increased. GPUs have evolved over time to become a fundamental building block in computer architecture. The 1980s saw the development of individual graphics display processors, and the 1990s sees the development of separate boards or cards that can be modularly replaced in computer systems. Now that you have an understanding of the history and evolution of the GPU, let's turn our attention to this groundbreaking architecture. Play video starting at :3:53 and follow transcript3:53 GPU architecture forms the core foundation of modern graphics processing. In this exploration, we'll delve into the intricate design and functionality that powers these essential components of accelerated computing. This is an image of a typical GPU. At the heart of the chip are thousands of GPU cores. A core is the component of the GPU that processes data, onboard cache memory, which acts as a typical cache, storing a copy of everything for quick, reliable data access. Parallel processing is possible with the use of multiple cores. Closest to the cores is the high speed GPU memory. This memory is designed specifically for GPU use and can be shared with other GPUs. Now that you have an understanding of the general GPU architecture, let's turn our attention to the factors that have led to the rise of NVIDIA GPU computing. AI models continue to grow in complexity and at an astonishing rate. In just the past three years, the size of state-of-the-art AI models has grown by three orders of magnitude, and this exponential pace will continue. This growth in data and AI model sizes requires more compute, which is possible through GPUs but not through CPUs, also required a mature set of frameworks and tools to maximize performance and accelerate deployment. NVIDIA is at the center of the AI stack, from architecture and platforms to CUDA, from frameworks to the triton server and NGC. GPU computing has given the industry a path forward to keep up with the expected performance evolution. This success is achieved through a highly specialized parallel processor design, which permeates the approach to the system design, system software, algorithms and optimized applications. Continuous optimizations between hardware and software produce ongoing performance gains. Let's explore some key data center trends that also contribute to the rise in GPU computing. The data center landscape is accelerating. Consider the pace at which new services are being adopted and to see how AI is accelerating the adoption of new services and capabilities. For instance, let's examine the time it took these apps to amass 100 million users. WhatsApp attained 100 million users in 49 months. ChatGPT did so in just two months. AI is accelerating how fast new services come along and connect with the community, and this is stimulating demand for advanced computing power. At the same time, there is sensitivity to climate change and the need for greener computing. There's also a challenge to get access to more compute in data centers around the world. Data center energy usage is exceeding 200 terawatts per year. The data center represents about 2% of the global energy usage, and this percentage is projected to increase to 5% by 2030. Data centers are power limited and take years to plan and build to meet the demand to deliver these new services, data center operators need to optimize the infrastructure they have with the power constrained data centers they have. Accelerated computing is one way to achieve that goal. Let's explore why accelerated computing is the path forward. NVIDIA CEO Jensen Huang famously stated Moore's law is dead, making the bold claim several times dating back to 2017. The end of Moore's law refers to the slowing down of exponential transistor density growth on microchips, leading to challenges in achieving regular and significant performance improvements in traditional semiconductor technology. It's essentially a physics problem, as transistors would eventually reach the limits of miniaturization. >> For nearly four decades, Moore's law has, The governing dynamics of the computer industry, which in turn has impacted every industry. The exponential performance increase at constant cost and power has slowed, yet computing advance has gone to light speed. The warp drive engine is accelerated computing, and the energy source is AI. The arrival of accelerated computing and AI is timely as industries tackle powerful dynamics, sustainability, generative AI. And digitalization without Moore's law as computing surges, data center power is skyrocketing and companies struggle to achieve net zero. The impressive capabilities of generative AI created a sense of urgency for companies to reimagine their products and business models. Industrial companies are racing to digitalize and reinvent into software driven tech companies to be the disruptor and not the disrupted. [MUSIC] >> CPUs are simply unable to keep up with the complex workload demands associated with accelerated computing. This limitation will only get worse as the size and complexity of models increase. Accelerated computing requires a comprehensive and integrated full stack approach that encompasses hardware, software, and frameworks to harness the full potential of accelerators like GPUs for complex workloads. For example, an Nvidia AI platform with H100 GPUs set new time to train records at scale across every workload. This includes the new LLM workload, where training times were reduced from days to hours and, in the case of the largest LLMs, from a month down to a week. It's important to highlight that Nvidia consistently pushes the boundaries of GPU technology, as demonstrated by our latest innovation. The Grace Hopper GH200, which we'll delve into in unit seven, which focuses on compute platforms for AI. Let's perform a deep dive comparison between the CPU and GPU to better understand the strengths and weaknesses of each. In the world of computing, understanding the distinctions between CPUs and GPUs is pivotal. In this topic we embark on a journey to compare and contrast these fundamental processing units. Central processing units, or CPUs, are a computer component designed to process complex instruction sets that execute code and manipulate data. Originally, instructions were processed one at a time in the processing unit of the chip, called the core. The core reads and executes the program instructions. As CPU architecture evolved, multicore processors were developed. This allowed several instructions to be processed simultaneously, leading to an increase in processing performance. GPUs are designed to execute simple instruction sets. Consequently, the number of cores that can be built in a comparatively similar silicon area is much larger than with the CPU. With relatively many more cores than a CPU, a GPU allows processing many simple instructions simultaneously. Both CPUs and GPUs are system components that work in tandem to process code and data. Let's look at some CPU characteristics over the last few years, CPUs have moved to a multicore architecture, with the latest CPUs containing up to 128 cores with fast clock speeds. CPUs also have large main memory however, the bandwidth of that memory is relatively low, which affects how quickly we can move data about. CPUs are designed to run multiple different applications concurrently. Each of these applications is assigned to a single thread or a small number of threads, and is scheduled in a time sliced manner to run on the CPU. This requires low latency to minimize the delay between issuing a request for data and executing the instructions on the data. This implies large caches to hold the data required by the threads, and complex control logic to ensure that the data is ready to go when the thread is running. One consequence of this is that a large amount of the silicon on a CPU is dedicated to data movement. Meaning that CPUs have relatively low performance per watt, as a significant proportion of the energy is used for data movement rather than actual calculations. In addition, there are cache misses when trying to access data which isn't in the cache yet, which can be very detrimental to performance. Let's look at the GPU in comparison. A GPU is optimized for executing highly parallel tasks stemming from its roots and generating actual computer graphics, where the same operation is applied to millions of pixels multiple times a second in order to render scenes. Modern GPUs have a huge number of compute cores, over 10,000 for some of the latest cards. However, these compute cores don't have the complex logic for prefetching data, for example. So instead, the GPU deals with the issue of latency by hiding it with computation. Essentially, we assign the GPU more tasks than its physical cores can handle. Contrasting the approach taken with the CPU, the GPU scheduler launches a thread that tries to execute an operation, for example, an addition. If the data is not ready for the thread to use, it issues a fetch command to the memory, and it stalls while waiting for the data to arrive. The scheduler then moves on to another thread to execute its instructions, and so on and so forth, creating a pipeline of issued instructions across a series of threads. Eventually, the original thread's data is ready and can continue execution. This switching between the threads hides the latency of the memory fetching and the issuing of instructions. Thus, the more work given to the GPU to perform, the easier it is for the scheduler to hide this latency. So what's needed is many overlapping concurrent threads. This contrasts with how we program for the CPU, where we typically allocate one or two threads per compute core. To support this, the GPUs have a very large register file to hold the state of the threads, and switching between them happens at no time penalty, even as often as every clock cycle. As such, most of the silicon on a GPU is given over to computation rather than data movement, giving them very efficient performance per watt. In addition, a GPU has very high bandwidth main memory compared to a CPU, as many applications are bandwidth bound rather than compute bound. This can deliver significant performance improvements. The additional bandwidth, though, comes at a cost, with the GPU memory being significantly faster but relatively smaller. For example, the A 100 GPU has an 80GB memory version, large for a GPU, but small compared to the CPU. However, it runs at a massive two terabytes per second bandwidth, pretty much an order of magnitude faster than the CPU memory. Let's dive deeper and explore how GPU acceleration works. A typical application has parts which need to be executed sequentially and parts which can be executed in parallel. Although the parallel parts may only be a small part of the overall application, they're usually the most compute intensive and time assuming. Therefore, by executing them in parallel on a GPU, we can see huge. Huge increases in performance, often orders of magnitude faster than on a CPU alone. There are many algorithms used across a huge range of domains, which can be parallelized and thus see significant performance improvements from this approach. Lt's contrast how data is processed by a CPU and GPU. Play video starting at :16:21 and follow transcript16:21 Let's consider how the flow of data occurs for those tasks that are offloaded between the CPU and GPU. Note that the CPU and GPU are two distinct entities. They each have their own memory, so we need to bear this in mind when programming for the GPU. First, we need to move the data we wish to work on from the CPU memory to the GPU memory. This is usually via the data path provided by the peripheral component interconnect, or PCI bus. The code to be executed by the GPU on the data is then copied from the CPU to the GPU. Once loaded, it is launched, and the operations on the data take place. Within the GPU, the data takes advantage of the various layers of cache for faster performance. After the GPU finishes processing, the resultant data is copied back to the CPU memory for any additional processing if required. Although this basic processing flow illustrates data passing through a PCI bus, it's important to mention that alternative GPU interconnection methods like NVlink exist. We will delve into these in a subsequent unit. Now that you have an understanding of the way data is processed from both a GPU and CPU perspective, let's turn our attention to GPU server systems. GPU server systems represent the backbone of high performance computing and deep learning infrastructure. In this section, we'll navigate the fundamentals of GPU servers and their ever growing ecosystem. A GPU server is a specialized computer system equipped with graphics processing units, or GPUs, designed to accelerate complex computations in the following ways. GPUs Excel in parallel processing tasks, making them ideal for tasks such as deep learning, scientific simulations, and data analysis. GPUs contain thousands of cores that can perform multiple calculations simultaneously. This parallel processing capability enables faster execution of tasks compared to traditional central processing units or CPUs, especially for applications that involve massive data sets and complex algorithms. GPU servers are optimized for specific workloads such as artificial intelligence, machine learning, and graphics rendering. They can dramatically reduce processing times, enabling researchers, engineers, and developers to solve complex problems and innovate more efficiently. Nvidia offers specialized GPUs designed for data center environments. While typically available in a PCIe form factor, GPUs can also come in an SXM or MXM form factor. Flagship examples of types of GPU systems include DGX H100, is a fully integrated hardware and software solution on which to build your AI center of excellence. Nvidia HGX H100 combines H100 tensor core GPUs with high speed interconnects to form the world's most powerful servers. The Nvidia H100 PCIe debuts the world's highest PCIe card memory bandwidth greater than 2000Gb/second gbps. This speeds time to solution for the largest models and most massive datasets. Nvidia MGX a modular reference design that can be used for a wide variety of use cases, from remote visualization to supercomputing at the edge. MGX provides a new standard for modular server design by improving ROI and reducing time to market. Nvidia cannot do it alone, therefore, we tap into a robust partner and cloud service provider ecosystem to power their accelerated compute solutions. Play video starting at :20:10 and follow transcript20:10 In today's enterprise market, modern data centers are responsible for almost any computing challenge. With AI, high performance computing, and data science growing exponentially, there is no doubt that an enterprise data center will need to have GPU systems to support this demand. All major data center server vendors offer GPU based AI systems Nvidia, Cisco, Dell, HPE, IBM, Lenovo, and others. These servers support two to 16 GPUs each and can be interconnected to create multi-server parallel processing systems. Also, Nvidia GPUs have been adopted by every major cloud provider, extending accelerated computing into the cloud. Let's explore how GPUs are consumed within the data center. There is an ever expanding range of workloads that lend themselves to GPU acceleration. Compute intensive tasks like AI training and inference, data analytics, and high performance computing, or HPC. General purpose tasks like visualization, rendering, virtual workstation, and deep learning. High density virtualization through solutions like virtual desktop and workstation. Enterprise edge solutions in controlled environments. Industrial edge solutions within industrial or rugged environments. Desktop workstations that support design, content creation, and data science workloads. Mobile workstations that facilitate design, content creation, data science, and software development workloads. Now that you have a solid grasp of the primary applications of GPUs in the data center, let's review the key takeaways from this unit and look ahead to the next lesson in the course. Play video starting at :22: and follow transcript22:00 Now that you've completed this unit, you should be able to recall significant milestones in GPU history and describe key developments in the evolution of GPU technology. Explain the core components of GPU architecture and their functions, demonstrating a clear understanding of how GPUs work. Analyze and compare the architectural differences between CPUs and GPUs, highlighting their strengths and limitations in various computing scenarios. Apply knowledge of GPU server systems to plan, configure, and deploy GPUs effectively considering hardware configurations and deployment strategies. Evaluate NVIDIA's AI GPU families, assess their features, and determine which GPU family best suits specific AI and deep learning use cases based on their capabilities and characteristics. Great progress don't stop here, continue the journey with unit 5, AI software ecosystem, which details a dynamic and rapidly evolving landscape where cutting edge algorithms, frameworks, and tools converge to enable groundbreaking artificial intelligence applications. See you in the following unit. Welcome. In this unit, we'll cover the software ecosystem that has allowed developers to make use of GPU computing for data science. We'll start with a brief overview of VGPU as a foundational technology. From there, we'll move into what frameworks are and their benefits with AI. We'll also provide an overview of the Nvidia software stack and Cudax AI software acceleration libraries. Later, we'll move on to Nvidia containerized software catalog known as NGC and discuss how Nvidia is extending AI to every enterprise using virtualization with Nvidia AI enterprise software suite. Play video starting at ::48 and follow transcript0:48 By the end of this unit, you'll be able to understand virtual GPU as a foundational technology upon which the AI ecosystem sits. Briefly describe the deep learning stack and CUDA define the steps that make up the AI workflow. Identify the various types of workflows from open source third party vendors as well as those provided by Nvidia. See what makes up NGC and the enterprise catalog and discuss their benefits. Walk through and describe the benefits and features of Nvidia AI, enterprise and Nvidia's provided AI workflows. Let's get started before we get into AI frameworks and the way Nvidia provides and supports these frameworks, let's take a few minutes to briefly cover VGPU as a foundational technology. Play video starting at :1:39 and follow transcript1:39 The workplace is experiencing a pandemic disruption that is changing the form and perspective about how we work. The adoption of digital technologies has helped organizations respond to the unprecedented challenges and increasingly make a mobile workforce more prevalent. By 2030, end user computing is expected to grow to $20 billion and 40% of storage and compute, shifting towards servicebased models. However, to build an enhanced digital workspace for the post pandemic recovery and beyond, we must move beyond defensive short team models and focus on sustainable, resilient operating methods. Improved user experience paired with security stands at the forefront of the corporate agenda. In fact, 53% of IT executives report their companies are increasing investment in digital transformation, while 49% are looking to improve efficiencies. Play video starting at :2:34 and follow transcript2:34 This is where Nvidia Virtual GPU technology comes into play, allowing it to deliver graphics-rich virtual experiences across their user base. Whether deploying office productivity applications for knowledge workers or providing engineers and designers with high-performance virtual workstations to access professional design and visualization applications, it can deliver an appealing user experience and maintain the productivity and efficiency of its users. Play video starting at :3:1 and follow transcript3:01 Application and desktop virtualization solutions have been around for a long time, but their number one point of failure tends to be user experience. The reason is very simple. When applications and desktops were first virtualized, GPUs were not a part of the mix. This meant that all of the capture, encode and rendering that was traditionally done on a GPU in a physical device was being handled by the CPU in the host. Enter Nvidia's virtual GPU or VGPU solution. It enables it to virtualize a GPU and share it across multiple virtual machines or VMs. This not only improves performance for existing VDI environments, but it also opens up a whole new set of use cases that can leverage this technology. Play video starting at :3:49 and follow transcript3:49 With our portfolio of virtual GPU solutions, we enable accelerated productivity across a wide range of users and applications. Knowledge workers benefit from an improved experience with office applications, browsers, high-definition video, including video conferencing like Zoom, Webex, and Skype. For creative and technical professionals, Nvidia enables virtual access to professional applications typically run on physical workstations, including CAD applications or design applications such as Revit and Maya. It enables GIS apps like Esri, ArcGIS Pro, oil and gas apps like Petrel, financial services like Bloomberg, healthcare apps like Epic, or manufacturing apps like Keisha, Siemens, NX, and SolidWorks, to name a few. Play video starting at :4:40 and follow transcript4:40 Our virtual software is available for on prem data centers and also in the cloud. Nvidia Virtual PC vpc and Virtual Apps VAps software for knowledge and business workers. Nvidia RTX Virtual Workstation VWs for creative and technical professionals such as engineers, architects, and designers. We have a series of courses to walk you through each software offering. Please review the virtualization sales curriculum for more detailed information. Play video starting at :5:12 and follow transcript5:12 Let's review how Nvidia Virtual GPU software enables multiple virtual machines to have direct access to a single physical GPU while using the same Nvidia drivers that our customers deploy on non-virtualized operating systems. On the left hand side, we have a standard VMware EsxI host. VMware has done a great job over the years virtualizing CPU workloads. However, certain tasks are more efficiently handled by dedicated hardware such as GPUs, which offer enhanced graphics and accelerated computing capabilities. Play video starting at :5:46 and follow transcript5:46 On the right side, from the bottom up, we have a server with a GPU running the ESXi hypervisor. When the Nvidia VGPU manager software, or Vib, is installed on the host server, we're able to assign VGPU profiles to individual VMs. Play video starting at :6:3 and follow transcript6:03 Nvidia branded drivers are then installed into the guest OS, providing for a high end user experience. Play video starting at :6:10 and follow transcript6:10 This software enables multiple VMs to share a single GPU or, if there are multiple GPUs in the server, they can be aggregated so that a single VM can access multiple GPUs. This GPU-enabled environment provides unprecedented performance while enabling support for more users on a server because work that was done by the CPU can now be offloaded to the GPU. Most people understand the benefits of GPU virtualization: the ability to divide up GPU resources and share them across multiple virtual machines to deliver the best possible performance. But there are many other benefits delivered by Nvidia virtual GPU software included in the Nvidia AI enterprise suite, which go beyond just GPU sharing. With Nvidia VGPU software, it can deliver bare metal performance for compute workloads with minimal overhead. Running virtualized integrations with partners like VMware provides a complete lifecycle approach to operational management, from infrastructure right-sizing to proactive management and issue remediation. These integrations allow for the use of the same familiar management tools from hypervisor and leading monitoring software vendors for deep insights into GPU usage. Nvidia VGPU supports live migration of accelerated workloads without interruption to end users. This allows for business continuity and workload balancing. The ability to flexibly allocate GPU resources means that it can better utilize the resources in their data center. Since virtualization enables all data to remain securely in the data center, the solution helps to ensure infrastructure and data security. Play video starting at :7:52 and follow transcript7:52 Let's now explore deep learning. We'll start with a brief review of what it is, then walk through an AI workflow. From there, we'll talk. We'll talk about the AI software stack in CUDA-X. Play video starting at :8:6 and follow transcript8:06 Deep learning is a subclass of machine learning. It uses neural networks to train a model using very large datasets in the range of terabytes or more of data. Neural networks are algorithms that mimic the human brain in understanding complex patterns. Labeled data is a set of data with labels that help the neural network learn. In the example here, the labels are the objects in the images, cars, and trucks. The errors that the classifier makes on the training data are used to incrementally improve the network structure. Once the neural network based model is trained, it can make predictions on new images. Once trained, the network and classifier are deployed against previously unseen data, which is not labeled. If the training was done correctly, the network will be able to apply its feature representation to correctly classify similar classes in different situations. To understand the AI ecosystem, you have to start with the workflow. The first step is the process of preparing raw data and making it suitable for the machine learning model. Examples of tools for this are NVIDIA RAPIDS and the NVIDIA RAPIDS Accelerator for Apache Spark. Once the data is processed, we move on to the training phase. This is where we teach the model to interpret data. Examples of tools for this are PyTorch, the NVIDIA TAO Toolkit, and TensorFlow. Next, we refine the data through optimization. An example tool for this is TensorRT. Finally, we deploy the model, making it available for systems to receive data and return predictions. NVIDIA Triton Inference Server is a tool we would use to do this. So what are frameworks? Frameworks are designed to provide higher-level building blocks that make it easy for data scientists and domain experts in computer vision, natural language processing, robotics, and other areas to design, train, and validate AI models. They can be an interface library or tool which allows developers to more easily and quickly build models. Data scientists use frameworks to create models for a variety of use cases such as computer vision, natural language processing, and speech recognition. For example, mxnet is a modern open-source deep learning framework used to train and deploy deep neural networks. It is scalable, allowing for fast model training, and supports a flexible programming model and multiple languages. The mxnet library is portable and can scale to multiple GPUs and multiple machines. Scikit Learn is a free software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms and is designed to interoperate with the Python numerical and scientific libraries, NumPy and SciPy. TensorFlow is a popular open-source software library for dataflow programming across a range of tasks. It is a symbolic math library and is commonly used for deep learning applications. Data scientists can use frameworks to create models for a variety of use cases such as computer vision, natural language processing, and speech recognition. The diagram shows the software stack for deep learning. The hardware is comprised of a system, which can be a workstation or a server with one or more GPUs. The system is provisioned with an operating system and an NVIDIA driver that enables the deep learning framework to leverage the GPU functions for accelerated computing. Containers are becoming the choice for development in organizations. NVIDIA provides many frameworks as Docker containers through NGC, which is a cloud registry for GPU-accelerated software. It hosts over 100 containers for GPU-accelerated applications, tools, and frameworks. These containers help with faster and more portable development and deployment of AI applications on GPUs across the cloud, data center, and edge, and are optimized for accelerated computing on GPUs. Hence, the stack includes running the NVIDIA Docker runtime specific for NVIDIA GPUs. The containers include all the required libraries to deliver high-performance GPU acceleration during the processing required for training. The CUDA Toolkit is an NVIDIA groundbreaking parallel programming model that provides essential optimizations for deep learning, machine learning, and high-performance computing, leveraging NVIDIA GPUs. There are two ways you can go about building an AI platform. You can either take the do-it-yourself approach or leverage NVIDIA AI enterprise, both of which we'll discuss over the next two sections. Leveraging open-source software has become a mainstream method for AI and machine learning development because it can be collaboratively shared and modified upon distribution. However, building your own AI platform based on open source can be risky without a robust support for production AI. Open-source software is often distributed and maintained by community developers. Without the dedicated resources for quality assurance and verifications, open-source software deployment is often limited to the current GPU architecture and offers only self-service support. With NVIDIA AI Enterprise, enterprises who leverage the open-source practices can build mission-critical applications on top of the NVIDIA AI platform. NVIDIA AI Enterprise provides NVIDIA Enterprise support and hardware testing and certifications for past, current, and future GPUs. Now that you have an understanding of the two ways you can build an AI platform, let's explore the benefits of the NVIDIA AI Enterprise solution. In order to use a do-it-yourself or build your own approach, or download and use NIDIA AI Enterprise, all software for either of these approaches is provided in the NVIDIA's NGC and the enterprise catalog. Let's take a few minutes to explore that now. Navigating the world of software stack for AI and accelerated applications is complex. The stack varies by use cases. AI stack is different from HPC simulation apps and genomics Stack is different from the visualization app. The underlying software stack to run a particular application on different platforms, from on-prem to Cloud, from bare metal to container, and from VM to microservices also varies. NGC catalog offers containerized software for AI, HPC, data science, and visualization applications built by NVIDIA and by our partners. The containers allow you to encapsulate the application and its complex dependencies in a single package, simplifying and accelerating end-to-end workflows, and can be deployed on-premises in the cloud or at the edge. NGC also offers pre trained models across a variety of domains and AI tasks such as computer vision, NLP, and recommender systems. Such pretrained models can be fine tuned with your own data, saving you valuable time when it comes to AI model development. Finally, for consistent deployment, NGC also has helm charts that allow you to deploy your application and NGC collections, which bring together all the necessary building blocks, helping you build applications faster. The pretrained models in the NGC catalog are built and continually trained by NVIDIA experts. For many of our models, we provide model resumes. They're analogous to a potential candidate's resume. You can see the data set the model was trained on training epochs, batch size, and, more importantly, its accuracy. This ensures That users can find the right models for their use case. The NGC catalog has rich collections of general purpose, such as Resnet 50 and Unet. More importantly, the catalog also provides application specific models such as people or vehicle detection, pose and gaze estimation. You'll also find models in conversational AI that include speech recognition, texttospeech, language translation, and more. Not only do you get these rich assortments of models, but these models can also be easily fine tuned with your custom data or can be easily integrated into industry SDKs like Reva or Deepstream. Containers are now ubiquitous when it comes to developing and deploying software. A container is a portable unit of software that combines the application and all its dependencies into a single package that is agnostic to the underlying host OS. In scientific research, containers allow researchers to easily reproduce and corroborate without having to rebuild the environment from scratch. NVIDIA NGC containers offer certified images that have been scanned for vulnerabilities and are thoroughly tested. Some of our containers are backed by enterprise support via the NVIDIA AI enterprise program. The containers are designed to support multi GPU and multi node applications for high performance. NGC containers can be run with many container runtimes, including docker, cryo containered and singularity on bare metal virtual machines and kubernetes environments. With a monthly update cadence for deep learning containers such as Tensorflow and Pytorch, the containers are continually improving to offer the best performance possible while targeting the latest versions of software. To provide easy access and support to your AI journey without having to build it yourself. NVIDIA AI enterprise is the easiest onramp. The next sections will briefly walk you through what it is, what it does, and how to find it. Play video starting at :18:2 and follow transcript18:02 NVIDIA AI platform consists of three important layers, accelerated infrastructure that provides accelerated computing to power the entire AI technology stack. AI platform software, which is the NVIDIA AI enterprise software suite for production AI and AI services for enterprises to easily build AI applications leveraging state of the art foundation models. We'll be focusing on NVIDIA AI enterprise, the software layer of the most advanced AI platform. NVIDIA AI platform provides reliability and security for production AI consisting of four important layers. Infrastructure optimization and cloud native management or orchestration layers are essential to optimize your infrastructure to be AI ready. Cloud native management and orchestration tools facilitate deployment of the solution in cloud native and hybrid environments. AI and data science development and deployment tools includes the bestinclass AI software that's needed for development and deployment. AI workflows, frameworks and pretrained models are designed for enterprise to quickly get started with developing specific AI use cases and addressing business outcomes. For example, customers might leverage included AI workflows to develop intelligent virtual assistants for contact centers or digital fingerprinting to detect cybersecurity threats. Play video starting at :19:27 and follow transcript19:27 The entire software stack can be flexibly deployed across accelerated cloud, data center edge and embedded infrastructure. Wherever you choose to run your AI workloads. Applications can run anywhere that NVIDIA infrastructure is available with one license. NVIDIA AI Enterprise covers your AI center of excellence or COE needs partnered with the most experienced group of enterprise AI experts in the market with included enterprise support. NVIDIA AI platform offers cloud native hybrid optimized deploy anywhere on Prem and in the cloud. Reduced development complexity secure and scalable certifications with broad partner ecosystem. Improved AI model accuracy, standard support 9 by 5 premium 24 by 7. Now that you have a general understanding of NVIDIA AI enterprise and its benefits, let's turn our attention to GPU virtualization. NVIDIA offers a diverse range of SDKs, models, and frameworks. This slide provides a concise overview of their functions. For a deeper understanding of any specific model or framework, a quick Google search is recommended. To round up this discussion on the AI ecosystem, we will briefly cover NVIDIA's AI workflows. One question that frequently arises is whether there is a difference between an AI workload and a workflow. We believe there is a difference, and NVIDIA provides solutions to address both scenarios. There are customers who are running workloads already, and these can be accelerated by NVIDIA frameworks and libraries that leverage NVIDIA GPUs. Also, there are organizations who would like to deploy specific workflows but aren't quite sure how to build them or how to get started. For these customers, we've created AI workflows which are assembled, tested, documented, and customizable to provide customers a head start in solving specific challenges. Now that you understand the differences between workloads and workflows, let's explore another potential point of confusion. Let's explore NVIDIA's AI workflows available through NGC and the Enterprise catalog. These are pre packaged solutions designed to assist AI practitioners with specific use cases. Each workflow guides you through the necessary tools and steps to create and run a variety of workflows. These workflows have been fully tested and are vetted by NVIDIA. In the future, NVIDIA plans to introduce more AI workflows to cover a broader range of use cases. The majority of enterprises are now moving to adopt AI, but the vast majority are struggling with the complexity of getting to production. Our AI workflows are designed to give these customers a jumpstart on their AI journey, with pre packaged reference examples illustrating how NVIDIA AI frameworks can be used to build AI solutions. Included in our workflows are our AI frameworks, pretrained models, training and inference pipelines, Jupyter notebooks, and helm charts. All of these components are curated to help customers accelerate the path to delivering AI outcomes. The advantages for customers are two fold they can rapidly develop and deploy their solutions and produce solutions that provide the highest accuracy and performance. And if they encounter challenges, NVIDIA's enterprise support team is just a call away. Let's take a moment to summarize the unit and talk about the next step in your learning journey. Now that you've completed this unit, you should be able to describe VGPU, which serves as a foundation for the AI ecosystem. Describe the NVIDIA deep learning software stack and NVIDIA CUDA- X ecosystem define the steps in an AI pipeline workflow and identify some of the available tools to facilitate each step. Define what frameworks are and identify open source third party and NVIDIA frameworks. Describe the benefits of NGC and the enterprise catalog to provide building blocks in a DIY AI solution. Describe the benefits and use cases of NVIDIA AI enterprise. Describe provided AI workflows. Don't stop here continue the introduction to AI in the data center learning journey see you in the following unit. Welcome to Unit 6 where we delve into data center and Cloud computing, the environments that drive AI workloads. This unit begins with a short recap of prior units introducing key AI concepts and features, we then explore data centers and Cloud computing as environments for running AI workloads. The unit progresses by familiarizing us with the components of AI infrastructure. We close by shedding light on the aspects of operating an AI data center. By the end of this unit, you'll be able to summarize AI principles and features discussed in prior units, outline the hosting environments for AI workloads such as data centers and the Cloud, enumerate the components constituting AI data centers, and indicate the requisites and methods for managing and monitoring AI data centers. Our exploration has thus far underscored the immense value AI brings to diverse industries. Familiar technologies in our daily lives are increasingly powered by AI. With continual evolution into machine learning, deep learning, and generative AI each phase unlocking new capabilities. Generative AI emerges as a powerful tool facilitating the rapid generation of diverse content for creatives, engineers, researchers, scientists, and more. Its applications span industries producing novel content like stories, emails, music, images, and videos. The advent of accelerated computing, notably powered by GPUs, has become pivotal as CPU scaling reaches its limits. GPUs play a crucial role in providing the necessary processing power for complex AI workloads. Additionally, the importance of a suitable software stack cannot be overstated, acting as the backbone that orchestrates the seamless interaction between hardware and AI algorithms, ensuring optimal performance and efficiency in this rapidly evolving technological landscape. Where does this AI magic happen? Play video starting at :2:13 and follow transcript2:13 We begin by discussing the environments of where AI workloads run, data centers or the Cloud designed and built for computing. A data center is commonly described as a physical or Cloud facility designed to host essential business applications and information. These centers encompass various components that can be broadly categorized into three groups; storage, compute, and networking. Given that AI workloads are both data and compute intensive, traditional data centers may fall short. The massive datasets used by AI require high performance and high speed storage, while extensive computations demand execution on multiple accelerated systems. To achieve this, multiple compute, storage, and management nodes are networked to form a cluster. The interconnected network must provide high performance and low latency to avoid becoming a bottleneck. At the same time, these specialized data centers are equipped with power and cooling infrastructure to optimal hardware functionality. In upcoming units, we'll delve into the essential infrastructure components for AI supportive data centers and explore the fundamentals of managing and monitoring these dynamic environments. Let's embark on this journey together. In the next phase, we'll explore how IT leaders build and scale their data centers infrastructure to readily adopt AI. AI applications demand significant computing power driven by both training and inference workloads. Accelerated systems utilizing high powered processors, memory, and GPUs efficiently process large amounts of data distributed across interconnected nodes. GPU accelerated servers are offered by your OEM of choice. Play video starting at :4:1 and follow transcript4:01 AI workloads involve large computations which are distributed over hundreds and thousands of nodes. Distributed computing involves the utilization of multiple interconnected nodes working together to perform a task or execute a process. In this model, the workload is distributed across various machines connected by a high speed, low latency network. AI workloads have introduced new challenges and requirements for data center network architectures. The network defines the data center and serves as the backbone of the AI infrastructure. It is essential to consider the networks capabilities and end to end implementation when deploying data centers for AI workloads. Play video starting at :4:43 and follow transcript4:43 Accelerate systems provide massive computational power for AI training and inferencing. Completing those jobs in a timely manner requires high sustained rates of data delivery from the storage. High speed storage access is crucial for AI work loads, enabling rapid data access and transfer rates for improved performance and reduced latency. Data centers must provide sufficient storage capacity and address considerations such as capacity, performance, network hardware, and data transfer protocols. AI applications demand more power for computations increasing power usage and generating heat. Inefficient cooling can result in reduced equipment life, poor computing performance, and greater demand on cooling systems. Sustainable computing maximizes energy efficiency which is crucial to reducing the environmental impact of technology growth. Adopting sustainable practices helps data centers lower their carbon footprint and energy use. Some factors to maximize energy efficiency include accelerated computing and efficient cooling. Accelerated computing is the most cost-effective way to achieve energy efficiency in a data center. By utilizing specialized hardware to carry out certain common, complex computations faster and more efficiently, data centers can perform more computations with less energy. Efficient cooling technologies, like direct liquid cooling, efficiently dissipates heat, offering energy saving advantages, such as improved heat transfer, reduced air flow needs, targeted cooling, and waste heat reuse. Play video starting at :6:24 and follow transcript6:24 Let's begin with an overview of reference architectures. Dense computing environments include many components. There are multiple servers for compute, networking fabrics that connect the systems, storage for data, and management servers. Play video starting at :6:42 and follow transcript6:42 Designing systems to get maximum performance can be very difficult. Reference architectures are documents showing a recommended design for the implementation of a system. It uses best-of-breed designs to provide high-performance solutions. Play video starting at :7:1 and follow transcript7:01 Dense computing environments include many components. There are multiple servers for compute, networking fabrics that connect the systems, storage for data, and management servers. Designing systems to get maximum performance can be very difficult. Reference architectures are documents showing a recommended design for the implementation of a system. It uses best-of-breed designs to provide high-performance solutions. Reference architectures can be used as a foundation for building designs using systems and components. As AI technology continues to advance and integrate into enterprise operations, the challenge of building and maintaining a robust on-prem AI infrastructure becomes critical. Cloud-based solutions, especially those leveraging GPUs, offer a flexible and accessible alternative to physical data centers. Play video starting at :7:55 and follow transcript7:55 The AI data center infrastructure section provides six comprehensive aspects to guide the design of data centers for AI workloads. After establishing a data center, effective management and monitoring become imperative. Let's explore some of the related aspects. Play video starting at :8:15 and follow transcript8:15 Managing IT infrastructure for AI poses unique challenges for IT admins, data scientists, and line of business owners. The complexity of modern data science workloads, incorporating GPU acceleration, and high speed networking requires specialized attention. Infrastructure provisioning, IT admins, navigate, diverse, often container-based data science workloads distinct from traditional enterprise operations. Managing complex computing infrastructure involving GPU acceleration and high speed networking is a critical responsibility. Workload management, data scientists are tasked with more than using their laptops. They require access to centralized compute resources, but often lack the IT knowledge to independently utilize these systems. Moreover, they need scalable access to resources as their needs expand from experimentation to larger scale testing. Resource monitoring, line of business owners must ensure optimal use of compute resources, relying on relevant and accurate data about resource usage to make informed business decisions for their stakeholders. Additional aspects for operating an AI data center include container orchestration and job scheduling. Container orchestration and scheduling play pivotal roles in the efficient management of AI data centers. Container orchestration involves automating container-related operations, including provisioning, deployment, management, scaling, and load balancing. Orchestration tools handle these tasks based on the managed environment's specific needs. For advanced scheduling, an additional scheduling tool can be employed in conjunction with an orchestration tool. Scheduling is the process of assigning work loads to available compute resources. Schedulers provide a framework for launching and monitoring jobs, managing jobs in the queue, and ensuring that jobs receive the necessary resources to run. Play video starting at :10:19 and follow transcript10:19 The AI data center operation section consists of two units providing details and considerations on how to effectively operate your AI data center. Let's summarize what we've learned. Now that you've completed this unit, you should be able to summarize AI features discussed in prior units. Outline the hosting environments for AI workloads, such as data centers and the Cloud. Enumerate the components constituting AI data centers, and indicate the requisites and methods for managing and monitoring AI data centers. Now proceed to Section 2, delving into AI infrastructure, where you'll begin your exploration of its components. Starting with Unit 7, that addresses compute platforms designed for AI. See you in the next unit. Now let's start by reviewing GPUs and CPUs that power AI workloads in the data center. As we've already seen in earlier units, both CPUs and GPUs are components of a system that work in tandem to process code and data. While CPUs are designed for complex instruction sets and have evolved to include multiple cores for increased performance, GPUs are designed for simple instruction sets and have a larger number of cores for simultaneous processing. Together, they provide a powerful combination for executing code and manipulating data. There are different GPU and CPU architectures for different workloads. Let's explore some of those. GPU architecture is everything that gives GPUs their functionality and unique capabilities. It includes the core computational units, memory, caches, rendering pipelines, and interconnects. GPU architecture has evolved over time, improving and expanding the functionality and efficiency of GPUs. Let's review some of the latest NVIDIA processor architectures. Starting with GPUs. Hopper GPU architecture is the latest generation of accelerated computing, setting a new standard and accelerating large-scale AI and HPC built to speed up and optimize the world's largest language models used in applications such as recommender systems, conversational AI, and language processing. The Ada Lovelace GPU architecture has been designed to provide revolutionary performance for gaming, image generation, AI video, and mainstream generative AI applications. Ampere architecture was introduced in 2020 and is NVIDIA's GPU from the previous generation for deep learning training, inference, and HPC mainstream performance. Lastly, is the Grace Specialized CPU architecture designed for high-performance computing and data centers. Grace CPUs are intended to provide exceptional AI and high-performance computing capabilities by combining Arm's CPU architecture with NVIDIA's expertise in AI and parallel computing. The H100 GPU based on the Hopper architecture includes a new transformer engine built to deliver the power and performance needed by transformer-based models, the foundation of natural language processing tasks. The transformer engine has been pivotal for the development and acceleration of generative AI applications, such as recommender systems, image creation, natural language processing, and language translation. Built with 80 billion transistors, it's the world's largest and most powerful accelerator, delivering unprecedented performance, scalability, and security for any and every data center. It is also the first GPU with confidential computing, a security approach that enables the processing of sensitive data in an encrypted and isolated environment. It is designed to address security concerns related to data privacy and protection, particularly in cloud computing environments. The H100 has the fastest, most scalable interconnect with 900 gigabytes per second GPU to GPU connectivity, enabling acceleration for the largest AI models. In addition, the H100 GPU supports multiple fully isolated and secured instances or multi-instance GPU, which allows multiple users or applications to share the same GPU while maintaining data privacy and security. Curious why it was named Hopper? The Hopper architecture was named in honor of Grace Hopper, a pioneering US computer scientist. She was one of the first programmers of the Harvard Mark I computer and invented one of the first linkers. Her work laid the foundation for many advancements in computer science, and her legacy continues to inspire future generations. The L40S GPU based on the Ada Lovelace architecture is built to power the next generation of data center workloads, from generative AI and large language model inference and training to 3D graphics rendering and video. It delivers accelerated AI performance with fourth generation tensor cores, which are specialized hardware units found in NVIDIA GPUs. Tensor cores provide groundbreaking performance for deep learning, neural network training and inference functions that occur at the edge. Advanced video acceleration is also a key feature of the Ada Lovelace architecture. All Lovelace architecture based GPUs deliver scalability and security, and deliver up to twice the power efficiency of the previous generation. By naming this GPU architecture after Ada Lovelace, NVIDIA pays tribute to the legacy of another inspiring woman in science and mathematics. Ada Lovelace was a visionary mathematician and writer known for her pioneering work on Charles Babbage's Analytical engine. Her notes and insights published in the mid 19th century are considered the first computer program and foretold the concept of general purpose computing. Her contributions to the field of computing have earned her recognition as the first computer programmer and a lasting legacy in the history of technology. The NVIDIA A100 GPU is our prior generation flagship GPU built on the Ampere architecture. It is a powerful and advanced graphics processing unit designed for data center and high performance computing applications. It offers several key features that make it well-suited for various compute intensive tasks. The A100 is designed to deliver exceptional compute performance. It is capable of handling complex simulations, scientific computing, AI training, and more. Andre-Marie Ampere was a French physicist and mathematician and one of the founders of the science of classical electromagnetism. He formulated Ampere's Law, which is a fundamental principle in electromagnetism and plays a crucial role in understanding the behavior of electric currents and magnetic fields. NVIDIA Ampere architecture was named after Andre-Marie Ampere to honor his contributions to science and to symbolize the architecture's capacity to drive cutting-edge computational tasks. The NVIDIA Grace CPU is the first CPU designed by NVIDIA for the data center. It is built on ARM architecture and designed specifically for high performance computing applications. In addition to HPC, the Grace CPU is also ideal for use in Cloud computing and hyper-scale data centers. Its energy efficiency and scalability make it well suited for these environments where large numbers of CPUs are used to power demanding workloads. The Grace CPU is also a good fit for applications that require large amounts of memory and high bandwidth, such as genomics, computational fluid dynamics, and quantum chemistry. These applications often involve processing large data sets and performing complex calculations, making the Grace CPU's support for large amounts of memory and bandwidth a key advantage. The Grace CPU architecture is the foundation of two separate super chips. The first is the NVIDIA Grace Hopper super chip that combines the Grace CPU with the powerful H100 GPU, making it the most versatile compute platform for scale out. By using NVIDIA NVLink chip-to-chip interconnect, it supports high bandwidth, coherent data transfers between the GPU and CPU, yielding a large unified memory model for accelerated AI and high performance computing, or HPC applications. It shines when CPU performance and memory bandwidth are critical for applications such as recommender systems, graph databases, and scientific computing. The second is the NVIDIA Grace CPU super chip designed for CPU based applications where absolute performance, energy efficiency, and data center density matter, such as scientific computing, Cloud, data analytics, enterprise, and hyper-scale computing applications. The Grace CPU super chip represents a revolution in compute platform design by integrating the level of performance offered by a flagship x86-64 two-socket workstation, or server platform into a single super chip. Grace is designed for a new type of data center, one that processes mountains of data to produce intelligence. Hopper and Ada Lovelace are in NVIDIA's latest GPUs, excelling in diverse workloads and showcasing NVIDIA's cutting-edge technology. Known for adaptability and high performance, they provide computing for the largest workloads such as generative AI, and natural language processing, and deep learning recommendation models. Ampere GPUs, our previous architecture of GPUs, are renowned for their versatility. They find applications in a wide range of domains, from deep learning training inference and high-performance computing to 3D rendering and virtual production in media and entertainment. [SOUND] Welcome to Unit 13. In this unit, we'll be covering tools for management and monitoring AI clusters. In this unit, we'll begin with an overview of cluster management and monitoring. We'll discuss infrastructure provisioning and management and monitoring for resources and workloads. Finally, well give an overview of NVIDIA's Base Command Manager software. By the end of this unit, you'll be able identify the general concepts about provisioning, managing, and monitoring AI infrastructure. Describe the value of cluster management tools. Describe the concepts for ongoing monitoring and maintenance. And identify tools that are used for provisioning, management, and monitoring. There are three main concepts to consider when considering managing AI infrastructure. First is infrastructure provisioning. Provisioning is the process of setting up and configuring hardware. This includes the servers, switches, storage, and any other components of the AI cluster. The next concept is resource management and monitoring. This includes getting metrics and data from the resources in the cluster to determine how the cluster is performing and to make any updates or changes. The final concept is workload management and monitoring. This is how we ensure the data scientists and AI practitioners have the tools they need and understand the usage of the cluster. In the rest of this unit, we'll discuss these concepts in more detail and discuss some tools that can be used to accomplish the related tasks. In this section, we'll talk about infrastructure provisioning. Once the infrastructure is installed in the data center or procured in the cloud, the hardware needs to be provisioned before it can be used. The installed hardware may not have the latest or correct versions of software or firmware installed by default. The versions necessary will be dependent on the collective components in the data center and the workload needs. In preparation for provisioning, the correct versions of software and firmware must be determined and downloaded. Once the firmware and software are retrieved, the systems can be updated. This can include the operating system, GPU drivers, networking drivers, management tools, and any applications that need to be run on the servers, switches, and storage. The process can also include updating firmware for the hardware. When provisioning is complete, the compute nodes, GPUs, storage, and networking should be ready for workloads to be run on the system. There are several tools available for provisioning servers and systems. Ansible is an open source, command-line it automation software application that can configure systems and deploy software. Terraform is an infrastructure as code tool that lets you define both cloud and on-prem resources. Foreman is an open source project that helps system administrators manage servers throughout their lifecycle, from provisioning and configuration to orchestration and monitoring. In addition to the tools mentioned here, there are many other tools for on-prem and cloud-based provisioning and configuration. Next, we'll talk about resource management and monitoring. Managing and monitoring resources in a data center go hand in hand. There are standard maintenance tasks that must be done for the data center systems. Monitoring these systems can give insights into which systems might need attention beyond regular maintenance. Managing resources like compute nodes, network fabric, and other cluster components is critical to keeping a high performance cluster operating at its best. Let's discuss some of the monitoring and management tasks for an AI cluster. For the compute nodes, the overall system health of the nodes should be monitored as well as metrics on any special hardware like GPUs. Management tasks include installing patches and updates for security flaws, keeping the firmware up to date, installing and maintaining drivers, and replacing failing components. Network congestion, connection quality, and connectivity across the network should be monitored. This provides information on possible network issues like cable degradation or lost connections that could require changing out faulty cables. In addition, as workloads change or the cluster grows, the networking topology, bandwidth, or other factors may need to be upgraded. In addition to the compute nodes and the networks, other cluster components such as storage and maintenance nodes need to be monitored and managed. This includes monitoring disk space usage and the health of management nodes. Management also includes ensuring that management node software stays updated, that cloud and on-prem tools work together, and that the correct users are authorized and able to access and use the cluster. Depending on the configuration and servers in an AI cluster, there are a variety of tools that can be used for management and monitoring. Redfish by the Distributed Management Task Force, is a standard designed to deliver simple and secure management of servers, networks, storage devices and other infrastructure. It can be used for many of the management tasks in an AI data center. For data centers with NVIDIA GPUs, the Data Center GPU management, or DCGM, exporter tool is a tool based on the Go APIs to NVIDIA DCGM that allows users to gather GPU metrics and understand workload behavior or monitor GPU's in clusters. Prometheus is an open source monitoring system that collects and stores metrics. It works in connection with Grafana, a tool used to visualize time series data. As with provisioning, there are a multitude of management and monitoring tools available for AI clusters. We only shared a small subset of the tools here. Next, we'll talk about workload management and monitoring. Workload management and monitoring are critical tasks in an AI cluster. Workload management includes tasks like making sure workloads get the resources they need to run. Having access to the necessary number of GPUs or CPU cores is critical. Once the needs are determined, the jobs must be scheduled to run on the compute nodes. If a job has an issue, it may need to be stopped, and failed jobs may need to be restarted. Monitoring of workloads includes checking the efficiency of their resource usage. This can include if the GPUs and CPUs are being properly utilized, and if there are memory or storage concerns with certain workloads. Workload monitoring also includes monitoring the status of jobs to provide results or release resources back to the cluster. There are a variety of tools used for workload management. Many of these tools also include some monitoring capabilities or share job metrics with other tools. Some of the more common workload management tools are Jubernetes, Jupyter Lab, and Slurm. Kubernetes is an open source tool that manages and orchestrates containerized workloads. It has operators that allow it to work with NVIDIA GPUs, share metrics with Prometheus, and integrate with advanced schedulers like Run:AI. Jupyter is an open source web application used by data scientists and machine learning professionals to create and share computational documents, like code and data visualizations. It's an interactive web application that can be launched and run from within containers. Slurm is a tool created by SchedMD that's used to simplify resource sharing on large AI and HPC clusters. It's an advanced scheduling tool that can run interactive and batch jobs, prioritize jobs, and allows for resource reservations and per job quality of service settings. In the final section of this course, we'll give an overview of NVIDIA Base Command Manager, NVIDIA's set of tools to deploy and manage an AI data center. In the prior sections, we have discussed the importance, provisioning, monitoring, and management needs for AI clusters and workloads. We've also discussed a variety of tools to handle these tasks. NVIDIA Base Command Manager, or BCM, is a comprehensive software system that can handle the tasks of provisioning, managing, and monitoring an AI data center. It handles infrastructure provisioning, user access, and workload management and resource monitoring, sets up networking, security, and DNS, and ensures cluster integrity. It also automates server management and updates, preventing server drift. BCM can deploy Kubernetes and Slurm and other workload managers. It also allows for streamlined setup of Jupyter Lab. Finally, BCM has built-in job monitoring and metrics for GPUs and other cluster resources. Base Command Manager has a lot of capabilities that translate to three key value propositions. It, one, accelerates time to value by simplifying infrastructure bring up and workload provisioning so that data scientists can get resources they need quickly. Two, reduces complexity and effort by automating the management and operations of accelerated infrastructure. Three, enables agility by streamlining the ability for the infrastructure to support many different types of workloads with dynamic resource allocation based on business needs. All of these benefit the whole AI team, from IT to data scientists to line of business owners. Let's summarize what we've learned. Now that you've completed this unit, you should be able identify the general concepts about provisioning, managing, and monitoring AI infrastructure. Describe the value of cluster management tools. Describe the concepts for ongoing monitoring and maintenance. And identify tools that are used for provisioning, management, and monitoring. Don't stop here. Continue the learning journey with Unit 14, Orchestration and Scheduling. See you in the next unit. Welcome to Unit 12-1 AI in the Cloud unit overview. As AI expands its reach and Cloud computing becomes the go to platform for running AI workloads, this unit delves into the various subunits that comprise AI in the Cloud. Here we'll outline the structure of the unit and define the learning objectives. Let's get started. Let's examine what's in store for this unit. Introduction to AI in the Cloud covers the challenges, benefits, and unification of these two groundbreaking technologies. AI use cases in the Cloud details the wide variety of AI use cases that can be deployed in the Cloud, considerations when deploying AI in the Cloud, the supported Cloud service providers and the various ways to consume their services, finally, the NVIDIA solutions that run in the Cloud. Now that we've covered this unit's foundation, let's explore the learning objectives. Upon completing this unit, you'll be able to explain the multitude of ways Cloud computing enhances AI deployments. Describe the wide variety of AI use cases in Cloud computing environments. Outline the key considerations and strategies when deploying AI in the Cloud. Summarize the wide variety of Cloud service providers that support NVIDIA technologies and solutions. Categorize the various Cloud consumption models when deciding how to consume Cloud services. Evaluate NVIDIA Cloud solutions and how they can benefit your work loads. Let's begin with a powerful quote from NVIDIA CEO Jensen Huang as we explore the exciting possibilities of AI in the Cloud and how it's transforming the way we live, work and play. AI in the Cloud is the future of computing. It's the next generation of computing and it's going to transform every industry. This quote from Jensen Huang, the CEO of NVIDIA highlights the potential impact of AI in the Cloud for various industries and the future of computing as a whole. It's an impact statement that reflects the growing importance of AI and Cloud computing in the technology landscape. Let's begin our journey into the Cloud with unit 12-2 introduction to AI in the Cloud. This sub-unit explores several considerations for deploying AI in the Cloud. Before we dive into the considerations, it's crucial to comprehend the AI maturity model and determine where your organization stands within it. This understanding will guide your decision-making and ensure that your Cloud strategy aligns with your AI capabilities and goals. Navigating the complexities of AI can be challenging, but strategic decisions and investments can help organizations gain a competitive edge and progress along the AI growth curve. The AI Maturity Model is a framework for assessing the level of maturity of an organization's AI capabilities. The model consists of five stages, each representing a higher level of maturity and capability. Awareness. At this stage, organizations are just beginning to explore AI and understand its potential benefits and challenges. There may be some early conversations and experiments with AI, but no formal AI strategy or investments have been made. Active AI experimentation. Organizations at this stage are actively experimenting with AI in a data science context, such as using machine learning algorithms to analyze data and identify patterns. They may have a small team of data scientists and engineers working on AI projects, but AI is not yet pervasive throughout the enterprise. Operational AI in production. At this stage, organizations have successfully implemented AI in production environments and are using it to automate processes and improve decision-making. They've established best practices for AI development and deployment and have access to a wide range of AI technologies and experts. Systemic AI. Organizations at this stage have fully integrated AI into their digital strategy and are using it to drive innovation and growth. AI is considered a core component for all new digital projects and the organization has a well-defined AI roadmap and strategy. Transformational AI. At this stage, AI is deeply ingrained in the organization's DNA and is a key driver of business strategy and operations. The organization has a strong AI culture and has fully integrated AI into all aspects of its business, from product development to customer service. By assessing your organization's current level of AI maturity using this model, you can identify areas for improvement and develop a roadmap for advancing your AI capabilities to drive business growth and success. Let's explore some considerations starting with AI on-prem. Before delving into these factors, it's helpful to determine your current starting point and operational mindset, as this will serve as a foundation for your exploration and decision-making process. On one end of the infrastructure spectrum are those who start on-prem. An organization can operate on-prem and still be in the very early stages of the AI maturity model. Having your own gear doesn't mean you've reached critical mass of model prototyping volume or have built a production workflow for AI apps or have consolidated development silos onto shared infrastructure. It simply means that you looked at the ongoing cost of reserved Cloud instances versus the fixed cost of a system and decided the TCO, or total cost of ownership, was in favor of ownership. This is the CapEx versus OpEx consideration. It might also mean that keeping your data within the four walls of your data center is paramount. By maintaining control over your data on-premises, you can ensure data sovereignty and adhere to regulatory requirements, providing a critical layer of security and compliance. By deploying AI on-premises, you can quickly spin up and down resources to accelerate your AI development cycles, achieving the fastest iteration speed possible. With on-premises AI, you can enjoy predictable costs that scale linearly with your usage, allowing you to better plan and budget for your AI initiatives. Exploring these considerations can help you prepare and design a path to your ideal AI solution. Now let's turn our attention to the alternative approach of starting AI in the Cloud and explore the considerations of that path. On the other end of the infrastructure spectrum are those who start in the Cloud. For many organizations, a Cloud-first or Cloud-only approach is the starting point for their AI infrastructure strategy. This guiding principle often influences the decisions they make about how to deploy and manage their AI systems. By leveraging Cloud-based AI services, you can elastically scale your AI resources to meet changing temporal needs without being limited by fixed on-premises infrastructure. There are minimal barriers to entry. All Cloud providers offer Nvidia GPU instances, and you can easily turn them on and off like a faucet. This elasticity is ideal for organizations in the early stages of their AI journey. They're still experimenting with productive AI applications and don't yet have ongoing resource demands. Training runs are short since their models are small and their datasets are limited. However, as they mature in their AI capabilities, they'll begin to see AI as essential and require more advanced and specialized resources to support their growing models and datasets. This will lead to an increase in Cloud operating expenses. We know that many customers are already experiencing this inflection point. It's important to consider the trade offs which we'll address next. Which route should you take, Cloud or on-prem? Imagine an AI workflow where data is stored locally while compute resources are based in the Cloud. You could experience a performance boost of 2-6 times with specialized AI infrastructure compared to Cloud based solutions. Additionally, every 60 miles of distance between the data and the Cloud can result in a one millisecond increase in latency driving up the cost of data gravity. Furthermore, 62% of IT decision makers at large enterprises believe that their on premises security is stronger than Cloud security. As you navigate your AI journey, you face the question of whether to use Cloud or on-premises infrastructure for your AI workloads. However, it's not a one or the other scenario. A hybrid approach is necessary, leveraging both Cloud and on-premises solutions depending on where you are in your AI journey. Initially, you may start with a modest experimental approach in the Cloud, using Cloud hosted training capacity to quickly get started with minimal resources and budget. The Cloud is a great place to build skills and experiment with AI. However, as you scale and your data sets grow larger, an inflection point is reached. You may need to transition to on-premises solutions to support production scale AI applications. This is when the impact of data gravity becomes more pronounced and developers spend more time grooming each training run to avoid failure. This slows down iteration speed and can stifle innovation. To overcome these challenges, you need to adopt a hybrid approach that leverages the strengths of both Cloud and on-premises solutions. By doing so, you can optimize your AI workflow, reduce costs, and improve the speed and efficiency of your AI development processes. Let's explore this need for flexibility further. Here's a common use case that leverages a hybrid approach to AI deployment. Training, customization, and optimization take place on-premises utilizing local resources. This approach offers several advantages, including full control over the training environment. By training models on-premises, organizations can maintain complete control over the training process, including the hardware, software, and data used. Maintaining data sensitivity and compliance. On-premises training allows organizations to keep their data within their own networks, ensuring that it remains secure and compliant with regulations. Data gravity alignment. By training models on premises and then deploying them in the Cloud, organizations can align their data gravity with their AI workloads, reducing latency and improving performance. These models are then loaded into production inference within the Cloud, leveraging the scalability and flexibility of Cloud infrastructure to efficiently handle varying work loads during the inference phase. This hybrid approach allows organizations to take advantage of the strengths of both on-premises and Cloud based infrastructure, while minimizing the weaknesses and risks associated with each. Let's close out this topic with a recap of the key considerations. Customer needs around the deployment and development of AI are evolving. Influenced by factors such as data location, application specificities, and enterprise IT strategy. Some enterprises adopt a Cloud first approach, while others prefer and own the base rent the SPIKE strategy or a multi Cloud hybrid model AI work loads may need to remain on-premises or in specific geographic locations within the public Cloud due to constraints related to real time performance, data residency, or data sovereignty requirements. As enterprises diversify their IT strategies, there's a growing need for AI platforms that offer flexibility in developing and deploying AI applications across various environments. Data locality. Moving compute closer to where the data resides to minimize network congestion and improve application performance. Data sovereignty. Adhering to country specific requirements, governing where data originating in a geographic location must be stored and processed. Hybrid IT strategies. Growth in hybrid cloud multi Cloud approaches to leverage best of breed solutions for AIPOC's training and deployment at scale. Real time performance. Supporting applications that need to respond in real time or provide real time analytics and insights based on sensor generated data. Great momentum. Don't stop now. Next up is Sub-unit 12.5, which details the supported Cloud service providers and their consumption models. Welcome to Unit 7, compute platforms for AI. Topics covered in this unit include data center platform, GPUs and CPUs for AI data centers, multi GPU systems, introducing DPUs, and NVIDIA-certified systems. By the end of this unit, you will be able to, indicate the key components and features of the NVIDIA data center platform. Identify the GPU and CPU requirements for AI data centers, the different products available, and their intended use cases. Understand the purpose and capabilities of multi GPU systems. Describe the multi node GPU interconnect technology. Determine the role of DPUs and DOCA in an AI data center, and evaluate the benefits of using NVIDIA-certified systems. Let's get started. First, we'll present the NVIDIA data center platform and review some consideration and requirements for building compute platforms for AI. Modern data centers are key to solving some of the world's most important scientific, industrial, and big data challenges using high performance computing and AI. Accelerated computing is often referred to as a full stack challenge because it involves optimizing and integrating various components across multiple layers of the technology stack to achieve optimal performance for specialized workloads. Furthermore, accelerated computing represents a data center scale issue as the modern data center essentially serves as the computer applications span the entire data center making it crucial to optimize all the diverse components within it at the foundation are the hardware, technologies, GPUs, CPUs, and DPUs that form the basis for building servers. Sitting atop these servers is this software stack encompassing CUDA and DOCA, the programming models for GPUs and DP use respectively, along with numerous software libraries that transparently provide acceleration to developers across different hardware products, such as CUDA-X for GPU acceleration. In addition, we offer application frameworks tailored for common domains. Some examples include Riva for conversational AI, Drive for autonomous vehicles, Merlin for recommendation systems, and many others. Customers leverage our comprehensive stack to develop and run their applications effectively. As mentioned earlier, the data center is now the new unit of computing. To make this model work, it requires three pillars, the CPU, the GPU, and the DPU. The GPU is used for accelerated computing, performing parallel processing at the enormous scale required for graphics and AI. The CPU continues to perform general application processing, especially basic single thread applications, which it is good at. The DPU comes in to handle data-intensive functions like communications processing, compression, and encryption to keep the data center running efficiently. The combination of the GPU, the DPU, and the CPU is now the new unit of computing. We'll talk about each of those in more detail in the following slides. An accelerated system is the next phase in the evolution of computers. Just like how all smartphones today have processors for graphics and AI, so too will every server and workstation have compute accelerators to power today's modern applications, including AI, visualization, and autonomous machines. Many of these systems will also have data processing units which accelerate the network, storage, and security services that are central to cloud native and cloud computing frameworks. Leveraging cloud service providers or CSPs grants customers access to computing infrastructure and resources without the need for management and maintenance. We will delve into cloud based solutions in greater detail in a later unit. Additionally, OEM systems are readily accessible. NVIDIA collaborates with numerous reputable, established, and certified vendors offering flexibility in adopting solutions built from readily available components. Lastly, the NVIDIA DGX A100 and H100 systems are purpose-built with optimized components, including networking, storage, and compute. Customers who opt for DGX solutions gain access to NVIDIA's expertise, which can assist them in deploying and maintaining their solutions effectively. As organizations seek to build an AI application, they follow a workflow that begins with ideation, and is ultimately realized as a trained model running in a production setting. The process to go from an initial concept to a production application involves several phases enacted by a team that includes data scientists, data engineers, business analysts, DevOps, and potential application developers working in concert. The workflow shown here is an idealized example to showcase the key phases of this development process. With cloud-based GPU solutions, enterprises can access high density computing resources and powerful virtual workstations at any time from anywhere with no need to build a physical data center. From virtual desktops, applications, and workstations to optimized containers in the cloud, data scientists, researchers and developers can power GPU accelerated AI and data analytics at their desks. GPU-accelerated data centers deliver breakthrough performance for compute and graphics workloads at any scale with fewer servers resulting in faster insights and dramatically lower costs. Sensitive data can be stored, processed, and analyzed while operational security is maintained. AI at the edge needs a scalable, accelerated platform that can drive decisions in real-time and allow every industry to deliver automated intelligence to the point of action, stores, manufacturing hospitals, and smart cities. Welcome, in this unit, we'll discuss storage considerations for AI. Topics covered in this unit include, storage requirements for AI workloads, storage file system types, Nvidia validated storage partners, storage considerations summary. By the end of this unit, you should be able identify the storage requirements necessary for AI workloads, explain the key concepts of storage file systems, and apply them in relevant scenarios. Comprehend the benefits of using validated storage partners in an AI data center. Summarize storage considerations for AI workloads. Deep learning has become relevant in today's business environment because of the availability of fast computing and massive amounts of data. Model accuracy is often correlated with model complexity. In other words, as model complexity increases, it can better characterize the dataset, leading to increased accuracy. However, more complex models often require more data. For image classification tasks, training datasets can consist of millions or even billions of images. In autonomous driving, a single camera at 1080 pixels resolution can capture half a terabyte of data in 30 minutes. For natural language processing, billions of records are created every day through email, texts, tweets, and others. Most of this data is going to be stored somewhere, and the storage needs to be fast, flexible, and scalable so that it can be read and reused by many applications. Data must be stored in such a way that it can be effectively used and easily and quickly recalled by all users authorized to do so. Storage performance is often characterized in input output operations per second, or IOPs, as well as bandwidth and metadata operations. IOP's and bandwidth both represent how quickly data can move between the storage system and the servers. Metadata operations are those related to finding, querying, and manipulating the stored data and its structure. Ideally, the data should be visible across the IT infrastructure, which simplifies the use and management of the data throughout the enterprise. It must be labeled in such a way that it is easy to understand what it is and from where it came. Storage mechanism needs to provide methods to know if the data remains the same as when it was written and provides resiliency so that in the event of an eventual failure, the data can still be recalled or reconstructed into its original state. However, to build storage systems that are resilient, robust, fault tolerant, performant, and that provide a shared view is difficult. There are many solutions today, each with their strengths, but it is important to understand the end users needs to match those correctly. When deciding on a storage solution for an environment, the full lifecycle of the data should be considered. The following questions should be asked. How will the data be written? How will it be read? How often will it be accessed? Is the storage able to provide data fast enough? Who needs to access it? What should be done when there are system failures? Are there any potential concerns about privacy of the data? When should the data be retired? Understanding the answers to these questions will help select the most appropriate solution. Now let's review some of the storage file systems and their usage in AI data centers. There are many different storage systems available today that meet a variety of needs. Most servers have a local storage system. Local storage systems are fast, can provide strong performance, and are relatively simple when compared to network and shared storage systems. However, they are not shared for multiple applications to work on the same data, that data would have to be duplicated across those systems that want access. Network file systems provide a local like view of data to a group of servers. This is often accomplished using open standards based protocols to allow access by servers across different operating systems. Parallel and distributed file systems share data across a group of servers and scale out to the group in both performance and capacity as needed. Often, parallel file system clients rely on custom methods to provide a local like view of the data. Depending on the type and scale of a distributed file system, it can offer the highest read and write speeds over other storage systems. By far. Object storage systems provide ways to scale storage systems massively in capacity. They do this by providing APIs for accessing the data instead of providing a local like view of the data as other systems. However, object storage is not a standard, so applications must be rewritten to directly access the data. There are other methods of storage for large amounts of data, including SQL, NoSQL, and SQL-like databases. While these provide unique performance characteristics and access methods to store and retrieve records of data, they are not as general as the other file system types discussed. The most common shared file systems are based on the network file system protocol, or NFS, which is a standard protocol. NFS was developed by Sun Microsystems in 1984 to provide a way to share data across multiple servers. Files are stored in blocks similarly to how they're stored on a local file system. On the servers, the network file system appears to be local. Data on NF storage is accessed via the portable operating system interface, or POSIX, which defines the behavior for operations such as open read, seek, and close. NFS is a reliable solution with decent read and write performance and a simple interface. NFS appliances often have many mature features to improve usability, resiliency, and manageability of the system, including snapshots, data replication, and performance profiling tools. Nvidia storage partners that provide network file systems include NetApp, PureStorage, and Dell EMC. Parallel and distributed file systems are designed to scale out in both capacity and performance by allowing the spreading of storage across multiple storage units that are all connected with a high speed network. Parallel file systems divide files into small chunks, say 1 mb in size, and spread those chunks across different storage devices. These file systems also use the POSIX standard to present data similarly to a local file system to the clients, but the file system client is unique to each file system. Distributed file systems can store files on a single storage unit but still allow for the scaling of total performance and capacity through aggregating multiple servers that can access the data. They can often provide better single threaded and multithreaded performance when trying to maximize single node or multiple node aggregate performance. Custom alternatives to the NFS clients are used to maximize performance and support alternate communication networks such as Infiniband. Nvidia's storage partners providing parallel and distributed solutions include Data Direct Networks, IBM, and Weka IO. Object storage systems are designed to provide shared access to data in a more simplified manner than the other file systems discussed. They are designed to easily scale to petabytes and beyond. Even exabyte storage pools can be created. Object storage systems have no directory structure. Files are stored as blobs or buckets and referred with keys. These are key value pairs where a key can point to an entire file. Data resiliency is provided through data replication. The standard method of access is via a representational state transfer or rest API. These rest APIs sit on top of specific data access protocols so that each object storage pool is accessed differently than the others. Object storage systems are traditionally used for the largest cloud repositories, whether public or private. They are used to retrieve data to a local, network, or parallel file system. Examples of object storage systems include Amazon's Simple Storage Service, Google Cloud storage, OpenStack, and Microsoft's Azure blob storage. Let's review validate storage partners and benefits using validated storage partners solutions. A validated storage partner is a company that collaborates with Nvidia to ensure compatibility between their storage products and Nvidia's data center of solutions, including DGX SuperPOD and DGX BasePOD. Working with a validated storage partner guarantees seamless integration between storage and Nvidia hardware in data centers. Customers can trust that their storage will function optimally with Nvidia systems, enabling them to leverage the full performance and capabilities of their Nvidia hardware. Validated storage partners work hand in hand with Nvidia, fine tuning their products to optimize performance with Nvidia hardware. This meticulous optimization process can unlock substantial performance improvements. The benefits don't stop at performance. These products are put through a stringent testing process, ensuring their reliability and capability to withstand the demands of intense data center workloads. Nvidia validated partners offer a comprehensive range of products designed to scale and meet the needs of large scale data center deployments. This scalability ensures that as the needs grow, the solutions can grow accordingly. In today's digital age, security is paramount. Nvidia validated partners understand this and offer a variety of security features designed to safeguard data from unauthorized access. The validated partners can help reduce costs. They provide optimized solutions tailored to meet the specific needs of data center deployments, eliminating the need for expensive customizations or upgrades. AI applications need a very large amount of data storage. While the data is primarily read by the AI applications, write is also an important part of the overall storage solution. All the pieces of the storage hierarchy, local file systems that can be used as data cache, network file systems, parallel distributed file systems, and object storage have their strengths. It is not a simple task to group these different technologies into one storage bucket. Often, the traditional network file systems use a scale out approach, which is like the distributed file systems. Some file systems provide object storage access along with the local like view of data. Parallel and distributed file systems often provide NFS support for additional compatibility. As storage for a particular purpose is evaluated, one family of technologies shouldn't necessarily be discounted because of an assumption of a missing feature. Many of the file systems mentioned earlier can be combined in a multi-tiered storage hierarchy to offer the best performance, usability, and scalability, with the faster tiers being closer to the user and the slower data lakes serving as data archive. To optimize storage and its access, it is helpful to understand how data is accessed during the DL training process. Data records are repeatedly accessed in random order, so it is beneficial when the storage system can handle randomly accessed files quickly and efficiently. This can put pressure on the file systems metadata performance. While the first access of data may be slow, the subsequent accesses are going to control the overall DL training performance. For this reason, it is best when the data reads can be cached locally, either in ram or on local disk. In addition to reads, writes become increasingly important as models get larger in size. For very large models, write performance should be part of the consideration. When many models are trained at the same time, storage needs are amplified. Generally, choosing a storage solution that offers fast read IO along with data caching many times offers the overall best performance for most AI workloads. Data is a fundamental asset to any business, and the most important asset in an AI environment. Thus, accessibility and manageability of that data is critical. There are many very good networks, file systems, parallel file systems, and object storage technologies that can meet the rigorous demands of an AI data center. It is important to understand the benefits of each technology so they can be matched to the user and data center needs. There are many ways to measure performance of file systems, but the key performance metric for DL training is read and reread performance, the rate at which data can be accessed, is often correlated to the distance from the GPU, the closer the better. Therefore, using local system resources such as ram and local disk to cache data can increase training performance while also reducing stress on a shared storage system, preventing the need to over provision. When model sizes increase, write IO will also become more important. You cannot focus on the storage needs of training a single model. You will usually train multiple models at the same time, amplifying the storage needs. Nvidia has many partners providing best of breed storage technologies that are fully integrated, tested, and ready to deploy, and that reduce time to deployment and minimize system management risks. Now that you've completed this unit, you should be able identify the storage requirements necessary for AI workloads, explain the key concepts of storage file systems, and apply them in relevant scenarios. Comprehend the benefits of using validated storage partners in AI data center. Summarize storage considerations for AI workloads. Great progress. Don't stop here, continue the journey with Unit 10, Energy Efficient Computing. This subunit covers the various AI use cases and workflows that lend themselves to cloud deployments. So what types of AI related activities can you perform in the cloud? As AI-powered solutions continue to drive business innovation and growth, effective workload and infrastructure management is critical to optimize efficiency, scalability, and performance in the cloud. Cloud-based model deployment management enables organizations to deploy machine learning models at scale, automate deployment processes, and manage model versions and updates, all while leveraging the scalability and flexibility of the cloud. Cloud-native management and orchestration involves using cloud-native tools and technologies to manage and orchestrate cloud-based applications and services, including containerization, service meshes, and serverless computing, to achieve scalability, agility, and high availability. Cloud-based cluster management solutions enable organizations to easily deploy, manage, and scale containerized applications and services across distributed cloud environments, ensuring high availability, scalability, and cost-effectiveness. Infrastructure acceleration libraries, such as those for containerization and serverless computing, can help optimize cloud-based applications and services by providing pre-built components and tools for rapid deployment and scaling. Resulting in improved resource utilization, reduced latency, and increased scalability. The development and training of AI models are notoriously resource intensive, making cloud deployment an ideal choice for AI development activities. By leveraging cloud resources, organizations can tap into the scalability and flexibility they need to train and deploy their AI models efficiently and effectively. Data preparation is the process of preparing raw data and making it more suitable for machine learning models. Modeling training is teaching AI to accurately interpret and learn from data to perform a task with accuracy. In the simulate and test phase, you iteratively improve machine learning model accuracy to reduce error. For the deployment phase, you make models available to other systems so they can receive data and return predictions. Recall that Unit 5, AI Software Ecosystem, provided in-depth coverage of tools for various phases, and these same tools can be utilized within the cloud. By leveraging the power of cloud computing and the latest enhancements in AI development, organizations can unlock new possibilities for innovation and growth and stay ahead of the competition in today's fast-paced digital landscape. There are numerous AI use cases that are appropriate to deploy on the cloud. Given its scalability, flexibility, and cost-effectiveness, here are some of the most common AI use cases for cloud deployment. Deploying a large language model, or LLM, in the cloud accelerates natural language processing tasks, enabling businesses to leverage advanced linguistic capabilities, automate communication, and enhance customer interactions on a scalable and cost-effective platform. Speech AI use cases, including things like chatbots and virtual assistants to automate customer support and other business processes. Speech recognition services to transcribe audio and video recordings and improve the customer experience. Recommendation engines can be used to personalize customer experiences and improve customer satisfaction. Cloud based fraud detection and prevention services can be used to identify and prevent fraudulent activities in real time. Sentiment analysis is the process of using natural language processing and machine learning techniques to identify and quantify the emotional tone and subjective opinions expressed in text. Data supply chain optimization services can be used to optimize supply chain operations such as demand forecasting, inventory management, and route planning. Predictive maintenance services can be used to predict equipment failures and prevent unplanned downtime. This is just the tip of the iceberg. With many more applicable use cases, you are halfway through the unit. Great progress so far. Next up on the journey is subunit 12.4, which covers key considerations when deploying AI in the cloud. This sub unit explores the most widely used Cloud service providers, or CSP's, and their various consumption models, helping you make an informed decision for your organization's Cloud needs. Accelerate your AI journey with NVIDIA's support for all major CSP's. NVIDIA has partnered with the leading Cloud service providers to give you the freedom to deploy your AI workflows and solutions with the Cloud service provider of your choice, including Amazon Web Services, AWS, Microsoft Azure, Google Cloud Platform, GCP, and Oracle Cloud Infrastructure. The NVIDIA Cloud ecosystem is continually growing through ongoing collaborations with CSP's. Now you can seamlessly scale your AI workloads across different Cloud environments, ensuring the highest levels of flexibility, reliability, and performance. Now that we've covered the supported CSP's, let's delve into the various consumption models available across these platforms. Let's start with the on-premises model. As a baseline for comparing Cloud consumption models. In a traditional on premises deployment, the customer is responsible for managing the entire technology stack from the data center and network infrastructure to storage, physical servers, virtualization, operating systems, scaling, application code and data configuration. Cloud delivery models primarily differ in the division of responsibilities between the customer and the Cloud service provider for managing the full stack of resources required to run a workload. This contrasts with traditional on-premises infrastructure where the customer is responsible for managing the entire data center In Cloud computing infrastructure as a service, IaaS provides on demand infrastructure services managed by the Cloud service provider, or CSP, with the customer responsible for elements beyond infrastructure. Platform as a Service PaaS builds on IaaS, letting developers focus on coding, while CSP manages both hardware and software. Software as a Service SaaS delivers complete applications with the CSP handling all hosting components. Importantly, the level of responsibility shifts between the CSP and the customer depending on the consumption model. To effectively leverage AI in the Cloud, the first step is to understand your company's current Cloud strategy and how they're using the Cloud today. This includes understanding any enterprise Cloud contracts in place and the specific Cloud services being used such as IaaS or PaaS. This foundation will inform how AI can be leveraged to maximize benefits. With a solid grasp of the Cloud consumption models, let's dive into one of the most popular options for getting started with Cloud computing. What exactly is a Cloud marketplace and why should you use it? A Cloud Marketplace provides a one-stop shop. Cloud marketplaces offer a wide range of Cloud services from various providers, allowing you to compare and choose the best services for your needs. Easy discovery. Cloud Marketplaces provide a centralized platform for discovering and exploring different Cloud services making it easier to find the right services for your project. Streamlined procurement. With Cloud Marketplaces, you can quickly and easily purchase and deploy Cloud services, eliminating the need for lengthy procurement processes. Cost savings. Cloud marketplaces can help you save money by providing transparent pricing and allowing you to compare costs across different providers. Increased agility. With Cloud marketplaces you can quickly and easily scale your Cloud resources up or down to meet changing business needs. Increasing your agility and responsiveness to market demands. With Cloud marketplaces, you can quickly and easily get started with AI in the Cloud, streamline your procurement process, save money, and increase your agility in the ever changing AI landscape. Let's assess your understanding before concluding this unit with the final sub unit 12-6 NVIDIA's solutions in the Cloud. This section delves into NVIDIA's Cloud solutions demonstrating how they can empower you to unleash the complete capabilities of AI in the Cloud. Only one sub unit left way to go. The last sub unit covers NVIDIA's solutions in the Cloud where we detail all of the NVIDIA's solutions that can be used in the Cloud. See you in the final sub unit. Now that we have a solid understanding of GPU and CPU requirements and implementations in an AI data center, let's explore the third pillar of a data center, the DPU. DPUs are designed to meet the infrastructure requirements that modern data centers must offer for today's cloud computing and AI workloads, providing a secure and accelerated infrastructure. The best definition of the DPU's mission is to offload, accelerate, and isolate infrastructure workloads. It offloads by taking over infrastructure tasks from the server CPU, so more CPU power can be used to run applications. DPUs run infrastructure functions more quickly than the CPU can, using hardware acceleration in the DPU silicon, therefore, accelerating network traffic and improving applications performance. DPUs offer isolation by moving critical data plane and control plane functions to a separate domain on the DPU to relieve the server CPU from work and protect the functions in case the CPU or its software is compromised. The data processing unit, or DPU, is a data center infrastructure on a chip that enables organizations to build software defined hardware accelerated IT infrastructure. Running infrastructure services on the host CPU steals precious CPU cores, which impacts application performance and reduces efficiency, sometimes severely. The role of the DPU is to offload and isolate the infrastructure services from the host CPU and accelerate them in hardware by leveraging purpose built hardware accelerators, freeing up the host CPU for money making applications, and improving data center performance, efficiency, scalability, and security. A DPU has several specialized accelerators for networking, security and storage. These accelerators are designed to execute these tasks much more efficiently than the CPU cores, allowing you to process greater amounts of data more quickly and often using significantly less power. It can also run compute heavy tasks in environments where the physical footprint is limited, like in far edge applications. The NVIDIA BlueField-3 data processing unit, or DPU, is the third generation infrastructure compute platform that enables organizations to build software defined hardware accelerated IT infrastructures from cloud to core data center to edge. With 400 gigabytes per second Ethernet or NDR, 400 gigabytes per second InfiniBand network connectivity BlueField-3 DPU offloads, accelerates and isolates software defined networking, storage, security, and management functions in ways that profoundly improve data center performance, efficiency, and security. BlueField DPUs provide a secure and accelerated infrastructure by offloading, accelerating, and isolating a broad range of advanced networking, storage and security services. From cloud to core to edge, it increases efficiency and performance. Let's take a look at some prominent use cases for NVIDIA, BlueField DPUs. The world's largest cloud service providers, or CSPs, have adopted the DPU technology to optimize the data center infrastructure stack for incredible efficiency and scalability. BlueField is used in bare metal virtualized cloud data centers, and more recently also in Kubernetes clusters often running on a bare metal infrastructure. BlueField DPUs enable a secure infrastructure in bare metal clouds. For cybersecurity, we see BlueField used in next generation firewalls, NGFW, micro-segmentation, and all security applications enabling a zero trust security everywhere architecture where security goes beyond the data center perimeter to the edge of every server. HPC and AI, Telco enterprise storage and CDN content delivery networks are areas where BlueField adds much value in accelerated performance, new functionality, and more. By supporting NVME over Fabrics, NVME-oF, GPU direct storage, data integrity, decompression, and deduplication. BlueField provides high performance storage access for remote storage that rivals direct attached storage. Finally, BlueField reduces CPU cycles in video streaming by offloading and accelerating video streaming to the DPU. NVIDIA DOCA is the open cloud STK and acceleration framework for BlueField DPUs. By leveraging industry standard APIs, DOCA unlocks data center innovation by enabling the rapid creation of applications and services for BlueField DPUs. It supports BlueField-3 for empowering thousands of developers, simplifying the development for networking, storage, and accelerating infrastructure services in the cloud. Now that you have an understanding of the three pillars of the data center, GPU, CPU, and DPU, let's review the NVIDIA certified servers that provide an end to end platform for accelerated computing. An NVIDIA certified system brings together NVIDIA GPUs and NVIDIA networking onto systems from leading vendors. It conforms to NVIDIA's design best practices and has passed a set of certification tests that validate the best system configurations for performance, manageability, scalability, and security. With NVIDIA certified systems, enterprises can confidently choose performance optimized hardware solutions backed by enterprise grade support to securely and optimally run their accelerated computing workloads both in smaller configurations and at scale. NVIDIA certified servers help to secure workflows by protecting data at the platform, network, and application layers. Whether deployed in a data center or at the edge, laptops or desktops, customers can be assured that they don't have to compromise on security features when running accelerated applications. Certified servers bring together a whole set of technologies in server configurations that have been validated for the most optimal functionality. Depending on the choice of GPU and network adapter, workloads can benefit from numerous capabilities for performance, security and scalability. GPUs provide record setting acceleration of many algorithms in machine learning, deep learning, and data analytics. In addition to fast video processing and rendering, high speed interconnects allow data to be moved quickly to servers and directly to GPUs for faster processing. Network encryption offload for TLS and IPsec provide security for data in motion without compromising throughput as key management and secure boot features provide host level security. Accelerated data transfer between GPUs and servers unlocks efficient multi-node processing for the biggest tasks such as large AI model training. On the other extreme, multi- instance GPUs, which allow a single GPU to be split into multiple independent GPU instances, allow for dynamically scaling out within a host, enabling flexible utilization. Now that you've completed this unit, you should be able to indicate the key components and features of the NVIDIA data center platform. Identify the GPUs and CPUs requirements for AI data centers, the different products available, and their intended use cases. Understand the purpose and capabilities of multi-GPU systems. Describe the multi-node GPU interconnect technology. Determine the role of DPUs and DOCA in an AI data center. Evaluate the benefits of using NVIDIA certified systems. Continue the journey by taking the next unit, networking for AI. In this unit, we will be focusing on energy efficient computing and your data center. Let's cover the learning objectives for this unit. By the end of this unit, you should be able to thoughtfully articulate the steps in the planning and deployment of a data center, including the equipment that will be installed in the data center. First, we'll go over power consumption and cooling considerations. After that, we'll discuss how NVIDIA's technology is designed and optimized for efficiency. Next, you'll learn how NVIDIA mitigates negative impact in the data center with efficient cooling systems. Finally, you'll see how data center co location impacts and improves efficiency. NVIDIA strives for low environmental impact by ensuring GPUs consume fewer resources and run as efficiently as possible. Let's start by covering data center considerations and NVIDIA GPU architectures. Planning a data center deployment requires a balance between all five of the process domains, datacenter operations, IT operations, NOC support, the application owner, network operations. These five domains must continuously be coordinated and balanced to ensure a successful deployment. At a high level, data center resources can be categorized as power, cooling and space. Given that data centers have finite resources, a change in one of these resources impacts the other two. This drives the need to optimize resource utilization for efficiency. The graph above illustrates the recent explosion of energy demand in data centers due to the massive data, loads, complex data models, and the extreme processing power required to run today's applications and tools. As computing is sophisticated and realized new possibilities in today's applications, especially AI tools and applications, the need for data center resources such as power and space has increased substantially. Accelerated computing with GPU technology is optimized efficiency in the data center. This is because individual GPUs handle large scale compute intensive functions with less technology and require less space. If one were to compare work loads on either a CPU or a GPU, while the GPU requires more power consumption, the amount of time the work load runs is significantly reduced. The increase in power consumed at a given time is offset by the fact that the workload runs so quickly, thus using less energy over time. Another benefit is that multi instance GPUs allow users to partition the GPU and each partition can have their workloads run simultaneously while not increasing the power consumption of the GPU. Processing capabilities have grown exponentially in the past decade, fueled largely by supercomputers, data centers, and Cloud computing. A data center with NVIDIA GPUs requires a fraction of the space and energy. A hyperscale data center with NVIDIA GPUs takes up only 1/47th of the rack space of the CPU based systems that it replaces and runs at 93% lower energy cost for AI models. Software can significantly improve the energy efficiency of AI workloads. We're continuously optimizing our Kuta X libraries and GPU accelerated applications, so it's not unusual for users to see an X factor performance gain on the same GPU architecture. AI workloads on NVIDIA Ampere architecture improved by 2.5 X over the past two years. We offer the latest versions of AI and HPC software from the NVIDIA GPU Cloud or NGC portal to help users run applications with better performance on their supercomputer. In the data center or in the Cloud. We estimate an energy savings of 20% on NGC workloads because of users implementing performance suggestions. As supercomputers take on more workloads, CPUs are stretched to support a growing number of communication tasks needed to operate large and complex systems. Data processing units or DPUs, which move data around the data center, alleviate 30% or more of this stress by offloading some of these processes from the CPU. Some workloads achieve more than 50 X performance improvement, allowing fewer servers to be deployed, and reducing power of a modest data center by four megawatts. The zero trust protection platform enabled by NVIDIA DPUs brings a new level of security to data centers at speeds up to 600 times faster than servers without NVIDIA accelerations further reducing the amount of infrastructure and power it would require. Built for AI, the NVIDIA spectrum four Ethernet switch enables extreme networking performance and robust security with 40% lower power consumption compared to the previous generation. Adequate cooling is required to optimize supercomputer performance. We deploy state of the art technology designed for NVIDIA's server products using computational fluid dynamics models to enhance cooling for data center designs and server rack deployments. We use physics informed neural networks, or pins available in NVIDIA modulus to design heat sinks for our DGX systems. Cooling solutions are closely coupled with server racks to localize and optimize heat transfer. We share our data center best practices with customers and partners to help optimize their deployments. In partnership with leading storage and networking technology providers, we offer a portfolio of reference architectures for optimal and efficient deployment of our DGX server products and we make these publicly available on our corporate website. One of the superpowers of accelerated computing is energy efficiency in terms of application throughput per kilowatt hour. This is a simple study of several common HPC applications and their performance on the HGX H1004X GPU system compared to a dual socket Sapphire Rapids 84 ADC 52 cores per socket system. For a balanced amount of run time for each application, there is a Geoman performance advantage to the GPU systems of 22.9x so it would take 23 times the number of CPU servers to achieve the same throughput. We assumed that both the GPU and CPU systems ran at TDP thermal design power to estimate that a 50 node HG supercomputer would use 2.6 gigawatt hours annually and the CPU system with 1150 servers would require 12.1 gigawatt hours. Clearly, the accelerated platform has a big energy efficiency advantage. When looking at data requirements and compute capabilities of the A100 versus the H100 for deploying AI workloads including HGXH100 in the data center is an optimal solution. The H100 requires fewer servers in the data center while still managing the same workload as significantly more A100 more specifically 64 H100 clocks in at a third of the TCO, using a fifth of the server nodes, and is 3.5 times more energy efficient. The DGX H100 system is the compute building block of the DGX Super Pod and it offers benefits for extreme performance for AI workloads. To deliver such performance, it has specific power, environmental, and cooling requirements that must be met for optimal performance and operation. An important characteristic is that the system is air cooled and it requires that air temperature remains 5-30 degrees, or 41-86 degrees Fahrenheit. When a group of these powerful systems are brought online, even more challenges arise. Now that we've discussed NVIDIAs approaches to reduced rack space and power consumption, let's talk about data center cooling and how NVIDIA optimizes cooling to improve efficiency. Let's look at the HGXA100 and H100 as well as their associated PCIs. Liquid cooled GPU's require less power, are smaller, and as a result require less rack space to meet NVIDIAs efficiency targets. GPUs are the hottest thing in computing 99.99% of the power used by the chip is converted to heat. As CPU's grow in power, their heat output has increased from eight watts to 150 watts per chip. Consequently, the heat output of CPU racks can range from 4 kilowatts to 12 kilowatts. In comparison, GPUs run at 250 to 500 watts per chip, dramatically increasing heat. The heat output of a GPU rack can range from 12 kilowatts to 45 kilowatts. Next generation racks could increase to 60 kilowatts, while future racks can reach 100-120 kilowatts. These are the current options for cooling GPU chips. The first option is cold air. Cooling via cold air flow is inexpensive, but reaches its limit when cooling 30 kilowatts per rack. The second option is to use water cooled heat exchangers. In this case, heat rejection is more efficient, but more expensive. Because of this, water cooled heat exchangers are now the accepted standard for high density cooling. They can serve between 20 kilowatts and 60 kilowatts per rack. Some manufacturers claim that it can serve over 100 kilowatts per rack. Let's consider direct air cooled systems. The steps involved in this process are as follows. Computer room air handling units, or CRAH's use chilled water circulated through coils with large fans to blow the hot air over the coils to absorb the heat. The fans in the CRAH units pressurize the cold air under the server room floor to distribute the air to all the systems and racks around the room. Floor grates inserted in the raised floor inside the contained aisles allow the cold air to circulate in front of the racks on both sides of the aisle. The fans of the systems being cooled draw in the cold air from the aisle and the heat from the chips raise the air temperature before the system fans exhaust the hot air out from the back of the system. The hot air returns to the CRAH units in the ceiling and is then drawn into the CRAH by its fans, where the heat is transferred to the chilled water inside the coil. Let's consider the second cooling option, heat rejection to air or water. The characteristics of this option include rear doors with chilled water coils. Coils are only six inches from servers. The chilled water captures heat from the servers, heat is transferred to the exterior to be dissipated. Rear-door heat exchangers are often used on solid or slab floor facilities, while cold aisle containment air-cooled systems are often deployed in raised floor facilities. Data center power provisioning must be completed prior to connecting power to the in-rack power distribution units, or PDUs, and system deployment. NVIDIA recommends that each component be supplied with redundant power sources to increase system reliability. AC power redundancy should be validated at each rack. An electrician or facility representative must verify that the AC voltage and total kilowatts supplied is within specifications at each of the floor-mounted PDUs and individual circuits, that is, power drops that feed the racks. The equipment served by each circuit breaker within the PDU should be clearly labeled. Let's take a few minutes to talk about the benefits of NVIDIA's DGX Data Center Co-Location Program in saving data center resources and improving overall efficiency. Businesses are becoming increasingly aware of the advantages of accelerated computing with GPUs. NVIDIA and its partners are at the forefront of the adoption of GPU computing in the data center with DGX-based systems offering unprecedented compute density designed to handle the world's most complex AI challenges. The systems have been rapidly adopted by a wide range of organizations across dozens of countries. Internet service companies, healthcare facilities, government labs, financial institutions, oil and gas businesses, and more have all benefited from building and deploying their own DGX system-based AI data centers. However, some businesses don't have the modern data center facilities that can support accelerated computing operations. As discussed previously, a single DGX A100 system draws 6.5 kilowatts. NVIDIAs current DGX pod reference architecture draws 18-35 kilowatts per rack. Many enterprises cannot support more than 8-15 kilowatts per rack in their existing data centers, and many even less. With the NVIDIA DGX-Ready Data Center Program built on NVIDIA DGX systems and delivered by NVIDIA partners, you can accelerate your AI mission today. The newly enhanced program offers a pairing function that connects you with the best partner for your needs, and is now available in Asia, Australia, Europe, North America, and South America with more markets coming soon. Also, select partners are providing a broader range of services including test drive and GPU as a service offerings. Through co-location, customers can avoid the challenges of facilities planning or the high costs and latency of the public Cloud, and instead, focus on gaining insights from data while innovating. With this program, businesses can deploy NVIDIA DGX systems and recently announced DGX reference architecture solutions from DDN, IBM Storage, NetApp, Pure Storage, and Dell EMC with speed and simplicity at an affordable op-ex model. NVIDIA continues to strive for a net zero data center with improvements across the GPU hardware from generation to generation, as well as across networking equipment. From Ampere to Hopper, there was significant improvement. Combined with the ability to run AI workloads faster and more efficiently, the amount of time a GPU is in use is reduced. Let's wrap up this unit by briefly discussing NVIDIA's goals for net zero by deploying the most efficient data center servers as possible before summarizing what you learned. Now that you've completed this unit, you should be able to articulate the design and planning of a data center and see how space, power, and cooling consideration affect the plans, discuss how NVIDIA's methods and servers optimize energy efficiency in data centers, describe cooling architecture of GPUs to improve efficiency, understand how co-location improves efficiency. Don't stop here, continue the learning journey with Unit 11: AI Reference Architectures. See you there. Welcome to unit 8. Our journey of data center infrastructure leads us into the domain of AI networking. In this unit, we will begin with an overview of AI data center networks. Next, we will discuss the networking requirements for AI workloads. Following that, we'll delve into the networking technologies that can fulfill these requirements, including InfiniBand and Ethernet. To wrap up the unit, we'll provide an overview of the NVIDIA networking portfolio. By the end of this unit, you'll be able to explain the basics of AI data center networks, outline the networking requirements that are essential for AI data centers, summarize the main features of InfiniBand and Ethernet networking technologies employed in AI data centers, and provide an overview of the NVIDIA networking portfolio. Let's get started. Let's start with an overview of AI data center networks. A typical AI data center will have four networks. The compute network is designed to minimize system bottlenecks and maximize performance for the diverse nature of AI workloads. It also provides some redundancy in the event of hardware failures and minimizes costs. The storage network provides high throughput access to shared storage. High bandwidth requirements with advanced fabric management features provide significant benefits for the storage fabric. The in-band management network provides connectivity to the management nodes. It's the primary network for everything that isn't inter-job communication or high-speed storage access, such as cluster management services, for example, SSH, DNS, and job scheduling, access to the NFS home file system, and external services like the NGC registry, code repositories, or data sources. An out-of-band management network provides remote management functions even if servers are offline or unreachable on the in-band network. It provides remote power control, a remote serial console, and temperature and power sensors. A separate network ensures that management traffic does not interfere with other cluster services or user jobs. In the following section, we will learn about networking requirements for AI workloads. GPUs process data quickly and in parallel. To get the greatest efficiency from GPU-based systems, GPU utilization must remain as high as possible. This means that GPUs will need high bandwidth transfers for such large quantities of data. As AI continues to advance, the models and related datasets are growing. Therefore, large amounts of data must be stored and passed to the compute system. In addition to the large amount of data, many AI models must be run across multiple GPU nodes. This requires the transfer of data to and from GPUs. Given these complexities and requirements, it becomes evident that the performance of AI models on GPU-based systems is not solely dependent on the hardware. Rather, it's the interplay of data management, GPU utilization, and network configurations that truly drives performance. Now let's delve into the key networking factors that influence this performance. There are several networking-related key factors that affect the performance; network topology, bandwidth and latency, network protocols, data transferring techniques, and management methods. Some of them will be discussed in this unit. As computing requirements continue to grow, the network is critical for maximizing the acceleration provided by the GPU. In the world of GPU computing, the transfer of input and output data to the GPU is a very expensive task, in terms of time. To maximize GPU-based acceleration, data must always be available for the GPU to operate upon. Because the GPU has so many processing elements, this can become a challenge. Obviously, you wouldn't want those elements to be inactive. They want to be fed all the time. As you would expect, several techniques are employed to optimize data movement to the GPU. Within a server, this means the CPU, memory, and storage must support bandwidth speeds and latency that do not cause significant GPU idle time. What is the difference between a traditional network optimized for Cloud and a network that's optimized for AI data centers? The way to think about it is there are two different types of networks. On the left is a traditional north-south network. It runs storage, controls traffic, and is a legacy network. On the right is a network optimized for AI. It connects GPU to GPU, it may have high-speed storage, it's lossless for RDMA operations, and it has low latency. The legacy network runs TCP, while the network optimized for AI runs RDMA. Legacy networks can operate with high jitter, while AI-optimized networks must avoid jitter. There's also a difference in the applications that run on the infrastructure. When you build a traditional Cloud, you have a lot of applications that all run independently. That is, they have no dependencies on each other. When you build an AI data center, it's very similar to an HPC cluster. There's a lot of dependency between all the nodes and the slowest element in the fabric sets the speed for the entire fabric and the entire AI cluster. What could this element be? It could be a slow CPU, high jitter or high tail latency, for example. The key measurement for the AI optimized network is how long an AI training job takes from start to finish. Nvidia AI supercomputers are increasingly in use around the world, generally in two settings, AI factories and AI Clouds. AI factories are being used for the largest AI training tasks. AI factories require tremendously high computational horsepower and the AI supercomputers that power them are built to train and refine very large, complex foundational AI models at extreme scale. The only way these systems can deliver such towering performance is to use a specialized network comprising NVLink and InfiniBand within network computing,. It's also important to note that an AI factory typically has one tenant or user of that factory and one job or a handful of jobs working at the same time. In contrast, AI Clouds are hyperscale systems serving volumes of users, hosting multi-tenants, and running many less complex, smaller, and lower scale jobs. The decreased demand for scale and performance of these systems are effectively served using Ethernet as their common network. Until now, the demands on AI Clouds have grown in scale. Support is still needed for small scale jobs, multi-tenant, and security. But now, AI Clouds must occasionally provide reliable support for large workloads such as generative AI. Traditional Ethernet networks which are built for general Clouds or traditional data centers are just too slow for the new generative AI workloads. In this chapter, we'll review InfiniBand protocol and how it helps to maximize performance of the AI data center. InfiniBand is a networking technology designed to deliver both high throughput and low latency, while minimizing processing overhead. The InfiniBand specification is maintained by the InfiniBand Trade Association, IBTA, and provides a solution starting from the hardware layer and continuing to the application layer. InfiniBand is an interconnect technology that allows high speed connections between compute systems and storage. In systems with multiple compute and storage nodes, a technology like InfiniBand is necessary to ensure that data transfers are high bandwidth and efficient. Apart from its low latency and high bandwidth capabilities, the InfiniBand interconnect also introduces intelligent offloading features. InfiniBand is a favorite of traditional HPC. It runs scientific simulations and models on parallel clusters, as well as for Cloud data centers and GPU accelerated AI workloads. InfiniBand solutions are the most deployed high speed interconnect for large scale machine learning, used for both training and inference systems. One of InfiniBand's key features is remote direct memory access, or RDMA. Direct memory access, or DMA, is the ability of a device, such as a GPU or network adapter to access host memory directly without the intervention of the CPU. RDMA extends DMA with the ability to access memory on a remote system without interrupting the processing of the CPU on that system. InfiniBand network adapters, also called host channel adapters, or HCAs, include hardware offloading allowing for faster data movement with less CPU overhead as it bypasses the TCPIP stack altogether. In summary, RDMA offers efficient data transfer where the OS bypass enables the fastest access to remote data possible. Efficient computing that reduces power, cooling and space requirements, support for message passing, sockets, and storage protocols, and support by all major operating systems. In this chapter, we'll examine the Ethernet protocol and its role in optimizing the performance of AI data centers. Ethernet was introduced in 1979 and was first standardized in the 1980s as IEEE standards. Ethernet describes how network devices can format and transmit data to other devices on the same local area network. Ethernet has become the predominant LAN technology thanks to its ability to evolve and deliver higher levels of performance while also maintaining backward compatibility. Ethernet's original 10 megabits per second throughput increased to 100 megabits per second in the mid 1990s, and currently supports up to 400 gigabits per second. Ethernet is designed to suit the needs of a broad range of applications, ranging from home networks to corporate LANs, to data center interconnects. Naturally, each type of application has unique requirements and protocols it must support. InfiniBand, on the other hand, has one focus, which is to be the highest performance data center interconnect possible. RDMA over Converged Ethernet, or RoCE, is a technology that allows RDMA over Ethernet networks. RoCE uses the InfiniBand packet header and encapsulates it with a UDP header that's carried over the Ethernet network. UDP is a very simple and flexible transport protocol that offers a great deal of interoperability and compatibility with legacy hardware. By making use of UDP encapsulation, RoCE can transcend layer 3 networks. RoCE is an open source and formal infiniband trade association standard. RoCE is becoming an important technology fundamental to accelerating AI storage and big data applications. Even if InfiniBand's RDMA is leaner and meaner. GPU Direct RDMA provides direct communication between NVIDIA GPUs in remote systems. It provides a direct or peer to peer data path between the GPU memory directly to and from the NVIDIA networking adapters. It minimizes CPU utilization and the required buffer copies of data via the system memory. NVIDIA adapter cards have onboard processing power that can aggregate data and sent smart interrupts to the CPU, utilizing near 0% of processing cycles from the CPU. In order to understand GPU Direct RDMA, let's look at the regular packets flow first. Packets are received from a remote node by the host channel adapter. The packets are sent via the PCI bus and copied to the system memory. Packets are handled by the CPU and then copied, again, to the GPU memory via the PCI bus. With GPU Direct RDMA, the process is simplified. Packets are received by the host channel adapter and sent directly to the GPU for processing. To summarize, GPU Direct RDMA saves full copy operations, reduces PCI transactions and CPU usage, and improves end-to-end latency. This section will provide an overview of the NVIDIA networking portfolio. Growing AI workloads are increasing the demand for more computing power, efficiency, and scalability. To meet these needs, NVIDIA provides complete end-to-end solutions supporting InfiniBand and Ethernet networking technologies. The industry leading NVIDIA Connect X family of smart network interface cards, Smart Mix offers advanced hardware offloads and accelerations for AI workloads. The NVIDIA BlueField DPUs provide a secure and accelerated infrastructure for any workload in any environment from Cloud to data center to edge. The NVIDIA Spectrum Ethernet switch family includes a broad portfolio of top of rack and aggregation switches, delivering industry leading performance, scalability, and reliability across a wide range of applications. NVIDIA Quantum InfiniBand Switch family comprising fixed configuration and modular switches, provides the dramatic leap and performance needed to achieve unmatched data center performance with less cost and complexity. Finally, the NVIDIA LinkX product family of cables and transceivers provides the industry's most complete line of interconnect products for a wide range of applications. Let's explore the NVIDIA Spectrum-X switches specifically designed for Ethernet use within AI data centers. The NVIDIA Spectrum-X Networking platform is the first Ethernet platform designed, specifically, to improve the performance and efficiency of Ethernet based AI Clouds. This breakthrough technology achieves 1.6 times the effective bandwidth for AI workloads. Increasing AI performance and energy efficiency along with consistent, predictable performance in multi-tenant environments. Spectrum-X is an NVIDIA full stack solution that leverages network innovations that only work with the combination of the NVIDIA Spectrum for Ethernet switch and NVIDIA BlueField-3 data processing units or DPUs. The combination of the Spectrum four Ethernet switch NVIDIA BlueField-3 DPUs installed on compute servers improves the standard Ethernet protocol by minimizing congestion and latency while improving bandwidth. With that said, Spectrum-X uses standards based Ethernet and is fully interoperable with any device that communicates with Ethernet. Spectrum-X is a pioneering solution designed specifically for the AI landscape. It harnesses the potent synergy of the NVIDIA spectrum four Ethernet switch and the NVIDIA BlueField-3 DPU ensuring unmatched performance. This platform delivers exceptional performance across AI, machine learning, natural language processing, and various industry specific applications. Spectrum-X empowers organizations to elevate AI Cloud performance, enhance power efficiency, and achieve superior predictability and consistency. Crucially, it undergoes rigorous tuning and validation across the entire NVIDIA hardware and software stack, ensuring an unparalleled Ethernet solution for AI Clouds. NVIDIA's RoCE adaptive routing is a fine-grained load balancing technology that dynamically reroutes RDMA data to avoid congestion and provide optimal load balancing. It improves network utilization by selecting forwarding paths dynamically based on the state of the switch such as Q occupancy and port utilization. NVIDIA's RoCE Congestion Control is a mechanism used to reduce packet drops in lossy networks or congestion spreading in lossless networks. It limits the injection rate of flows at the ports causing congestion, thereby reducing switch buffer occupancy, decreasing latency, and improving burst tolerance. NVIDIA's RoCE adaptive routing and congestion control Ethernet enhancements require Spectrum four switch and BlueField-3 DPU to work in unison. Now that you have completed this unit, you should be able to explain the basics of AI data center networks, outline the networking requirements essential for AI data centers, summarize the main features of InfiniBand and Ethernet networking technologies employed in AI data centers, and provide an overview of the NVIDIA networking portfolio. Don't stop here, continue the learning journey with Unit 9, storage for AI. See you in the next unit. Welcome to Unit 11, we'll be covering NVIDIA reference architectures. In this unit, we'll give an overview of some reference architectures and their benefits. Next, we'll use the DGX BasePOD reference architecture to show the type of information that can be found in a reference architecture. Finally, we'll look briefly at the DGX SuperPOD and DGX GH200 reference architectures. By the end of this unit, you should be able explain the value of reference architectures, describe the information found in reference architectures, identify available NVIDIA reference architectures, and describe the components in the NVIDIA BasePOD reference architecture. Let's begin with an overview of reference architectures. Dense computing environments include many components. There are multiple servers for compute, networking fabrics that connect the systems, storage for data, and management servers. Designing systems to get maximum performance can be very difficult. Reference architectures are documents showing a recommended design for the implementation of a system. It uses best-of-breed designs to provide high performance solutions. NVIDIA has several reference architectures for datacenter scale computing environments. These include the DGX BasePOD, DGX SuperPOD, DGX GH200, NVIDIA AI Enterprise, and Cloudera Data Platform reference architectures. Reference architectures are design documents that are based on the best practices and design principles to get the most out of the system. Reference architectures can be used as a foundation for building designs using systems and components. Some of the benefits of using a reference architecture include, they show how a specific design can help solve problems. They give a foundational design that can be tailored to meet an organization's needs. They reduce cost and time for design and planning, which can lead to a faster solution, and they improve quality and reliability by reducing complexity. Let's review the reference architecture for the NVIDIA DGX BasePOD. We will look through the detailed information provided in this reference architecture, including the components in a DGX BasePOD and a variety of configurations available. The Nvidia DGX BasePOD provides the underlying infrastructure and software to accelerate deployment and execution of AI workloads. It is an integrated solution consisting of NVIDIA DGX systems, NVIDIA networking systems, NVIDIA Base Command software, and NVIDIA AI Enterprise software, as well as Partner Storage. Its reference architecture defines the components and connections to create DGX BasePODs with up to 40 DGX A100 systems or up to 16 DGX H100 systems. It covers the variety of configurations for a DGX BasePOD. The reference architecture document can be accessed via the link provided. With DGX BasePOD, we've taken proven NVIDIA networking products, plugged in our leading DGX systems for compute, paired that with storage solutions from trusted NVIDIA partners, and then used NVIDIA base command to glue it together. Combined with Nvidia AI Enterprise and MLOPs offerings, this turns what would otherwise be a collection of world leading components into a cohesive, full stack solution. As we'll see, the BasePOD reference architecture covers these concepts in detail to make it easier to incorporate the components into a system that solves problems. In the next few slides, we'll review the components of the DGX BasePOD reference architecture. As we'll see, the BasePOD reference architecture covers these concepts in detail to make it easier to incorporate the components into a system that solves problems. In the next few slides, we'll review the components of the DGX BasePOD reference architecture. The DGX BasePOD reference architecture provides overviews of the DGX A100 system and the DGX H100 system, as well as their specifications and connections. The DGX A100 system includes eight Nvidia A100 GPUs and is equipped with ConnectX-6 or Connect-X7 adapters for network connectivity. This is a great all around system for AI development. The DGX H100 system includes eight NVIDIA H100 GPUs, and is equipped with ConnectX-7 adapters for network connectivity. The dedicated transformer engine in the H100 GPU makes it ideal for solving large language models. The next components that the NVIDIA DGX BasePOD reference architecture covers are the ConnectX-6 and ConnectX-7 network adapters. Each of the adapters can be configured for InfiniBand or Ethernet connections. Typically, InfiniBand connections are used for the compute network and Ethernet is used for the storage, in band management and out of band management networks. The DGX BasePOD reference architecture also includes an overview of the NVIDIA switches that can be employed in DGX BasePOD configurations. This includes the QM9700 and QM8700 InfiniBand switches. NDR stands for next data rate, which is 400 gigabits per second. The QM9700 InfiniBand switch is used 400 gigabits per second or 200 gigabits per second data communication, with a DGX H100 system or DGX A100 system respectively, depending on the BasePOD configuration. HDR stands for high data rate, which is 200 gigabits per second. This QM8700 switch can be used with the DGX A100 system in BasePOD configurations. The SN5600 Ethernet switch is used for GPU-to-GPU fabrics and offers speeds between 10 gigabits Ethernet and 800 gigabits Ethernet. The SN4600 is used for in band management and can also be used for storage fabrics. It offers speeds between 1 gigabit Ethernet and 200 gigabits Ethernet. The SN2201 Ethernet switch is used for out of band management connections in the BasePOD configuration, with speeds between 1 gigabit Ethernet and 100 gigabits Ethernet. After covering the components independently, the DGX BasePOD reference architecture document shares the complete reference architectures. These show how the components are combined to make the different DGX BasePOD configurations. The configuration shown here is DGX A100 BasePOD with HDR 200 gigabits per second InfiniBand connectivity for up to 10 nodes. It uses the ConnectX-6 network adapters with the QM8700 switches for connecting the compute nodes through an InfiniBand fabric. The SN4600 switches are used for connecting the storage and management networks through Ethernet. The reference architecture also shows the configuration for an A100 HDR BasePOD with up to 40 nodes. This is the configuration for the DGX A100 BasePOD. In this configuration, the compute nodes are connected with the QM9700 InfiniBand switches using NDR 200 gigabits per second InfiniBand connectivity and the ConnectX-7 network adapters. Even though the QM9700 switch is used in this design, the network bandwidth is still 200 gigabits per second. Using the NDR switches allows more nodes to be connected with fewer switches. The NDR switches are also compatible with the DGX H100 system. The SN4600 Ethernet switches are used for connecting the storage and management networks. The final configuration in the reference architecture is the DGX H100 BasePOD with NDR 200 gigabits per second InfiniBand connectivity for the compute network using the QM9700 switches. This configuration uses the ConnectX-7 network adapters for connections. The storage and management network is an Ethernet network using the SN4600 switches. This design works for 2 to 16 DGX H100 systems. Now we'll take a high-level look at the DGX SuperPOD and DGX GH200 reference architectures. The NVIDIA DGX SuperPOD is the next generation artificial intelligence supercomputing infrastructure based on the DGX A100 system or the DGX H100 system. The reference architecture design introduces compute building blocks called scalable units, or SUs, enabling the modular deployment of a full 140 node DGX SuperPOD with the DGX A100 systems, or a full 127 node SuperPOD with the DGX H100 systems. The DGX SuperPOD design includes NVIDIA networking switches, software, storage, and NVIDIA AI Enterprise, a fully supported software suite optimized to streamline AI development and deployment. The DGX Superpod RA has been deployed at customer data centers and cloud service providers around the world. The NVIDIA DGX GH200 is a new class of AI supercomputer that fully connects up to 256 NVIDIA Grace Hopper Superchips into a singular GPU offering 144 terabytes of shared and coherent memory with linear scalability. Because the memory is coherent, all GPUs can access any memory location without conflict. The large memory size makes the DGX GH200 ideal for the large AI models when the entire model needs to be in memory. DGX GH200 is designed to handle terabyte class models for massive recommender systems, generative AI, and graph analytics. The DGX GH200 reference architecture documentation includes information on the Grace Hopper Superchip, relevant networking, software tools, storage requirements, and NVIDIA AI Enterprise. NVIDIA also has reference architectures that are not based on specific NVIDIA servers. Some examples include the NVIDIA AI Enterprise reference architecture and the Cloudera Data Platform reference architecture. These reference architectures include node configurations in addition to the network topology, deployment topology, and other resources to get the most out of these designs. Now that you've completed this unit, you should be able to explain the value of reference architectures, describe the information found in reference architectures. Identify available NVIDIA reference architectures, and describe the components in the NVIDIA BasePOD reference architecture. Don't stop here, continue the learning journey with Unit 12, AI in the Cloud. See you in the next unit. Now that we've covered the GPU and CPU solutions for AI data centers, let's see how to scale up with multi-GPU systems. As an AI solution scales, it's key to know how to scale solutions based on increased workload demand. There are two ways to scale the solution; scale up, which is referred to as multi-GPU, and scale out, which is referred to as multi-node. Let's compare each one of these options. Multi-GPU scaling refers to adding more GPUs to a single node to increase its computational power. Whereas multi-node scaling refers to adding more nodes to a system to increase its overall processing power. In terms of hardware requirements, multi-GPU scaling requires a node with multiple GPUs and a high speed interconnect to allow communication between the GPUs. While multi-node scaling requires multiple nodes, each with its own processing capabilities, connected through a network. Multi-GPU scaling usually involves distributing data across the GPUs for parallel processing. Whereas multi-node scaling involves distributing data across nodes for parallel processing. Multi-GPU scaling also requires load balancing between the GPUs, while multi-node scaling requires load balancing between the nodes. Lastly, multi-node scaling provides better failure tolerance compared to multi-GPU scaling, as the failure of one node does not affect the overall system. Whereas in multi-GPU scaling, the failure of one GPU can affect the entire system. In the following slides, we'll cover the scale-up option or multi-GPU. As AI solutions become more complex, there is an exceptional growth in the computing capacity. To meet this challenge, developers have turned to multi-GPU system implementations. In multi-GPU systems, one of the keys to continued performance scaling is flexible, high bandwidth communication between GPUs in the system. In traditional servers, this is accomplished by using PCIE. However, as workloads continue to get bigger and GPUs are able to churn through data faster, the bandwidth provided by PCIE has proved to be a bottleneck. To meet the challenge of communication between GPUs and a system, NVIDIA introduced NVLink chip-to-chip interconnect to connect multiple GPUs at speeds significantly faster than what PCIE offers, allowing GPUs to communicate between themselves at incredibly high speeds. But in all-to-all communications where all GPUs need to communicate with one another, this implementation requires certain GPU pairs to communicate over a much slower PCIE data path. To take GPU server performance to the next level and scale beyond eight GPUs in a single server, a more advanced solution was needed. With AI and HPC workloads, there are many common operations which require one GPU to talk to all the other GPUs in the system, such as distributing data to the other GPUs. Often this happens on all GPUs simultaneously, leading to many so-called all-to-all operations. NVIDIA NVSwitch technology enables direct communication between any GPU pair without bottlenecks. Each GPU uses NVLink interconnects to communicate with all NVSwitch Fabrics. This provides the maximum amount of bandwidth to communicate across GPUs over the links. Each DGX H100 system has eight H100 GPUs. NVIDIA NVSwitch provides high bandwidth inter-GPU communication. The system is configured with 10 NVIDIA ConnectX-7 network interfaces, each with a bandwidth of 400 gigabits per second. It can provide up to one terabyte per second of peak bidirectional network bandwidth. When a system is configured with two Intel Xeon Platinum 8480C processors, it has a total of 112 cores, which means that it can handle a large number of simultaneous tasks and computations. The DGX H100 has two terabytes of system memory, which is a massive amount of memory that can be used to store and process large amounts of data. The DGX H100 is configured with 30 terabytes of NVMe SSD storage. This large amount of high speed storage can be used to store and access data quickly. AI performance of 32 quadrillion floating point operations per second. Let's look at the physical specifications of the DGX H100 system. It is an eight-rack unit high chassis that fits in a standard 19-inch rack. The system is quite heavy and requires a mechanical lift to help get it out of the packaging and safely installed in a rack. The DGX H100 is physically constructed from a handful of modules, each handling discrete functions in the system. At the front shown on the far left side, we have the gold bezel that should be familiar to anyone who has seen a DGX before. Behind the bezel are the 12-dual fan modules. Below those, are the eight U2 NVMe drives used as a data cache, and the front console board with VGA and USB ports to connect a crash cart too. The front cage includes a power distribution board that connects the system to the power supplies shown at the rear of the system. That front cage also holds a mid plane which handles communication between the motherboard tray, the GPU tray, and the components at the front of the system. The DGX H100 system offers impressive GPU to GPU connectivity, thanks to the presence of four fourth generation NVLink switches. These switches enable high speed data transfer and parallel processing capabilities among the installed GPUs, making it suitable for AI and data intensive tasks. The GPU tray is found at the top rear of the system with the motherboard tray underneath. The chassis holds everything together in a nice modular package. In this final subunit, we'll explore Nvidia's AI cloud based solutions and how they can help you harness the power of AI in the cloud. Nvidia AI platform is designed to address the challenges of enterprise AI and meet customers where they're at. With this full stack platform, enterprises can use the power of AI to deliver business results irrespective of where they are in their AI adoption journey. Nvidia AI platform is cloud native. Customers can use it to develop and deploy AI-enabled applications anywhere, from any public cloud to enterprise data centers to edge locations, without being locked into any one cloud or deployment option. Using this stack as a reference, we'll look at what each layer of the Nvidia AI platform looks like in the public cloud and the different ways to drive the consumption of Nvidia technology for cloud customers. Lets start at the bottom of the stack and work our way up. The foundation of the Nvidia AI platform is accelerated infrastructure, which in the context of cloud computing refers to virtual machine instances equipped with Nvidia GPUs. Additionally, some cloud service providers or CSP's, incorporate Nvidia networking technologies to achieve at scale performance. The Nvidia virtual machine images, or VMIs, available through CSP marketplaces, underlie application software, whether sourced from the NGC catalog or generic AI software. Nvidia VMis provide an operating system environment for running Nvidia GPU accelerated software in the cloud. These VM images are built on top of Ubuntu OS and are packaged with core dependencies. VMIs provide a GPU optimized development environment for your GPU accelerated application on a cloud service provider's infrastructure. VMIs are essentially the operating system for cloud vms. VMIs sit on top of cloud instance types. A cloud instance type is a predefined virtual server configuration provided by a cloud service provider, specifying computing resources like CPU, memory, storage, and networking for virtual machines. Users choose instance types based on workload, scalability, and budget, with options like general purpose, compute optimized, memory optimized, and GPU instances. Now that you've grasped the cloud accelerated infrastructure layer, let's ascend the Nvidia AI stack to touch briefly on Nvidia AI enterprise. Let's talk about the next layer, which is the AI platform software layer. Nvidia AI Enterprise is the AI platform software layer. Software is what enables enterprise AI applications to leverage the power of the underlying accelerated infrastructure. Application performance has a direct tie in to operational costs in the cloud, which is why you want to make sure you're always getting the most of your compute resources. With Nvidia optimized and Enterprise-supported software, customers can get both the best performance from the accelerated infrastructure and accelerate time to solution. While we covered Nvidia AI Enterprise in Unit 5, it's essential to emphasize its deployability in the cloud. In fact, Nvidia AI enterprise represents a secure, end to end and cloud native AI software platform specifically designed for production AI workloads. The solution is available across multiple deployment environments, including public, hybrid, and multi clouds. Another key component of the AI platform in the cloud is NGC. Let's explore that next. Let's connect the dots between Nvidia AI Enterprise and the Nvidia NGC catalog. Nvidia NGC serves as a central hub for all Nvidia services, software and support, providing customers with a one stop shop for our AI offerings. With a subscription or license to Nvidia AI Enterprise, customers gain access to the enterprise catalog hosted on NGC, which includes AI workflows and new AI services. However, free software access through NGC does not provide the same level of benefits as Nvidia AI Enterprise, such as Enterprise support, SLAs, access to Nvidia AI experts, and exclusive Enterprise product differentiators. Let's continue up the Nvidia AI platform stack to AI services. The topmost layer of the Nvidia AI platform is the AI services layer. This is the newest addition to the Nvidia cloud portfolio. It is the highest level of abstraction at which customers can engage with our platform. It brings to bear value of the entire Nvidia AI platform to the end customer as an Nvidia managed service. Let's start with Nvidia DGX cloud and work our way up the AI services. Consider the following with traditional AI development on traditional clouds, DIY tools and open source software is used to patch together a solution. Inconsistent access to multinode scaling across regions, searching through community forums, and voluntary contributions to find answers to your questions if you're lucky. Escalating costs, add on fees for reserved instances, storage and data egress. Nvidia DGX Cloud is a multinode AI training as a service solution for Enterprise AI. Within a single service that's offered at an all in one monthly price, it brings together the Nvidia base command platform, Nvidia AI Enterprise software, Nvidia DGX infrastructure combined with access to Nvidia AI expertise and support. Customers can just open a browser to get started without having to procure, set up and manage an AI supercomputer on their own. As a service, DGX cloud is hosted across multiple public clouds like Oracle cloud infrastructure, Microsoft Azure and Google Cloud. Having gained a fundamental comprehension of the DGX cloud solution, let us delve into the realm of Nvidia AI Foundations. Nvidia AI Foundations is another suite of Nvidia managed cloud services. Powered by the Nvidia DGX cloud, Nvidia AI Foundations is a set of cloud services for enterprises to build and run custom generative AI by leveraging state-of-the-art foundation models for text language, visual media, and biology. There are currently two collections that are part of the Nvidia AI foundations. Nvidia introduced the NEMOTRON-3 8B Enterprise-ready family of models, which has been trained using responsibly sourced data. These models deliver results comparable to larger models but with a reduced inference cost. Ideal for global enterprises. These models support over 50 spoken and 35 coding languages. They find application in various scenarios, such as chat and Q&A applications, across diverse industries including healthcare, telecommunications, and financial services. Community models optimized by Nvidia for both throughput and latency using TensorRT-LLM, ensuring the utmost performance efficiency. Achieving a 2x higher inference on LLAMA 2 with TRT LLM, these models include LLAMA 2, Mistral, Stable Diffusion, and Code Llama. Streamlined for customization, all models are converted to the .Nemo format. This allows developers to make the most of Nemo's data preparation, guardrails, and advanced customization techniques, facilitating the fine tuning of these foundational models with proprietary data on DGX cloud. Explore the models using the fully accelerated Nvidia AI stack. Test the models directly from your browser through a GUI or app without the need for additional setup. Seamlessly connect Enterprise applications to Nvidia hosted API endpoints to assess the full potential of the models in real world applications. These models can be found in the Nvidia NGC catalog, are accessible on several CSP's, and are also featured on the hugging Face website. Let's delve into the uppermost tier of the Nvidia AI services stack, known as Nvidia AI Foundry. An AI foundry is a new kind of service for creating custom generative AI models. The service should provide pioneering state-of-the-art pre-trained models, utilities for effortless customization of models with proprietary data, cloud native infrastructure with accelerated capabilities. These elements come together to enable the creation of customized enterprise grade models at scale. The Nvidia AI Foundry service gives enterprises an end to end solution for creating custom generative AI models. It encapsulates three elements. Nvidia AI foundation models, which we covered earlier, encompasses state-of-the-art pre-trained models from Nvidia, along with Nvidia optimized community foundation models. The models are hosted in CSP's model catalog. The Nvidia NeMo framework provides tools for fine tuning models with enterprise grade runtime, incorporating guardrails optimizations and advanced customization techniques. Nvidia DGX Cloud, which we covered earlier, is a serverless AI training as a service platform for enterprise developers that runs on various hyperscalers and is first being introduced with Microsoft Azure. Users can rent Nvidia DGX cloud, now available on Azure, and it comes with Nvidia AI enterprise, including Nemo, to speed LLM customization. The output is a custom model container tuned with proprietary data, guardrails ,and inference runtime. Once customized, these enterprise proprietary models can be deployed virtually anywhere on accelerated computing with enterprise grade security, stability, and support using Nvidia AI Enterprise. Let's end the unit with a review of the ways in which you can consume Nvidia solutions on the cloud. In summary, the effective deployment and utilization of AI capabilities in the cloud requires a keen focus on consumption, encompassing both the allocation of cloud resources and the optimization of associated costs, in order to unlock the full potential of AI in driving business success. Nvidia accelerated infrastructure in the cloud is no doubt the foundational piece of making our technology available broadly to cloud customers. However, Nvidia has come a very long way from doing just that and have built an entire full stack platform that can now be consumed in the cloud, be it full stack consumption with the AI services provided with DGX cloud AI foundations or AI foundry AI services, software and infrastructure consumption with Nvidia AI enterprise software or infrastructure consumption with different layers of the Nvidia AI platform, combined with our integrations on CSP's customers have a path to use and derive value from Nvidia, even within cloud services that they may already use today. Now that you've completed this unit, you should be able to explain the various ways cloud computing enhances AI deployments. Describe the wide variety of AI use cases in cloud computing environments. Outline the key considerations and strategies when deploying AI in the cloud. Summarize the wide variety of cloud service providers that support Nvidia technologies and solutions. Categorize the various cloud consumption models when deciding how to consume cloud services. Evaluate Nvidia's cloud solutions and how they can benefit your workloads. Concluding this unit marks a significant milestone in our journey through the introduction to AI in the data center course. Fantastic progress so far. As we look forward, we're set to explore Unit 13, AI Data Center Management and Monitoring. See you in the next unit.

All of the Nvidia lecture notes

Four modules Module 1 Module 2 Module 3 Module 4