Canonical’s Charmed Kubeflow is now certified as part of the NVIDIA DGX-Ready Software program. This collaboration between Canonical and NVIDIA aims to accelerate at-scale deployments of AI projects and makes open-source accessible on performant hardware for AI training.
Canonical and NVIDIA teams have performed a suite of tests, both on single-node and multi-node DGX systems, to validate the functionality of Charmed Kubeflow on top of both MicroK8s and Charmed Kubernetes. With the ability to run the entire machine learning workflow on this stack, enterprises unlock quicker experimentation and faster delivery for AI initiatives.
Based on The State of AI in 2022 from McKinsey, 5% of enterprises’ digital budget goes to AI. This results in more projects, but also puts pressure on teams to take projects to production to prove their value. Enterprises’ needs are shifting from simply identifying use cases to running AI at scale. Key features impacting both the AI/ML models as well as overall infrastructure are in high demand such as compute power, workflow automation, and continuous monitoring. These features are directly covered by Canonical offerings, including Charmed Kubeflow.
Charmed Kubeflow is an open-source, end-to-end MLOps platform that runs on top of Kubernetes. It is designed to automate machine learning workflows, creating a reliable application layer where models can be moved to production. Additionally, with a bundle that includes tools like KServe and KNative, inference and serving capabilities are enhanced regardless of the ML framework that is used. Charmed Kubeflow can be used with AI tools and frameworks like NVIDIA Triton Inference Server for model serving to enhance the stack.
NVIDIA DGX systems are purpose-built for enterprise AI use cases and built on optimised Ubuntu, offering the best of both worlds – the latest hardware updates and secure infrastructure to run ML workloads. Additionally, DGX systems include NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform, which include over 50 frameworks and pretrained models to accelerate development.
In 2018, OpenAI reported that the amount of compute used in large-scale AI training runs had been doubled every 3.4 months since 2012. Around the same time, the volume of data generated also increased dramatically.
Traditional, general-purpose enterprise infrastructure cannot deliver the required computing power, nor can it support the petabytes of data required to train accurate AI models at this scale. Instead, enterprises need dedicated hardware designed for AI workloads.
AI infrastructure solutions such as NVIDIA DGX systems accelerate the ROI of AI initiatives with a proven platform optimised for the unique demands of enterprise. Businesses can pair their DGX environment with MLOps solutions to operationalize AI development at scale. MLOps platforms such as Canonical’s Charmed Kubeflow, are tested and optimised to work on DGX systems, ensuring that users can get the most out of their AI infrastructure without worrying about manually integrating and configuring their MLOps software.