Skip to main content

Artificial Intelligence Technical Community Group

The Artificial Intelligence Technical Community Group (TCG) focuses on artificial intelligence and machine learning in cloud native environments, exploring AI/ML workloads, frameworks, infrastructure requirements, and best practices for running AI systems at scale.

Missionโ€‹

The AI TCG serves as a community rallying point for discussing, sharing knowledge, and coordinating initiatives related to artificial intelligence and machine learning in the cloud native ecosystem. We aim to bridge the gap between traditional AI/ML practices and cloud native architectures.

Focus Areasโ€‹

AI/ML Infrastructureโ€‹

Exploring infrastructure requirements and patterns for running AI/ML workloads in cloud native environments:

  • Compute Resources: GPU/TPU scheduling, resource allocation, and optimization
  • Storage Solutions: Managing large datasets, model artifacts, and training data
  • Networking: High-performance networking for distributed training
  • Resource Management: Efficient utilization of expensive AI/ML hardware

Model Lifecycle Managementโ€‹

Covering the full lifecycle of machine learning models:

  • Training at Scale: Distributed training patterns and frameworks
  • Model Serving: Inference deployment patterns and performance optimization
  • Model Versioning: Managing model versions and experiments
  • Model Monitoring: Performance tracking and drift detection

MLOps Practicesโ€‹

Best practices for operationalizing machine learning:

  • CI/CD for ML: Automated testing, validation, and deployment pipelines
  • Experiment Tracking: Managing experiments, hyperparameters, and results
  • Feature Stores: Managing and serving features for training and inference
  • Model Registry: Centralized model artifact management

AI Governanceโ€‹

Addressing governance, compliance, and ethical considerations:

  • Model Explainability: Making AI decisions transparent and understandable
  • Bias Detection: Identifying and mitigating bias in models and data
  • Compliance: Meeting regulatory requirements for AI systems
  • Security: Protecting models, data, and inference endpoints

Cloud Native AI Patternsโ€‹

Exploring patterns specific to cloud native AI:

  • Containerized Workflows: Packaging AI/ML workloads in containers
  • Kubernetes for AI: Leveraging Kubernetes for AI/ML orchestration
  • Serverless AI: Event-driven and serverless AI architectures
  • Edge AI: Deploying models to edge devices and environments

Getting Involvedโ€‹

The AI TCG welcomes participation from anyone interested in artificial intelligence and machine learning in cloud native environments:

Join the Communityโ€‹

  • CNCF Community Groups: Find us on community.cncf.io
  • CNCF Slack: Join slack.cncf.io and look for AI-related channels
  • Meetings: Check community.cncf.io for meeting schedules and join links

Contributeโ€‹

  • Share Experiences: Present your AI/ML use cases and lessons learned
  • Participate in Initiatives: Help with white papers, guides, or other deliverables
  • Ask Questions: Bring your challenges and learn from the community
  • Provide Feedback: Help shape the direction of AI in cloud native

Topics for Discussionโ€‹

Some areas where community input is valuable:

  • Kubernetes operators for AI/ML frameworks (TensorFlow, PyTorch, etc.)
  • GPU sharing and scheduling strategies
  • Cost optimization for AI/ML workloads
  • Data pipeline patterns for ML
  • Real-time inference architectures
  • Federated learning in cloud native environments
  • Integration with CNCF projects (Knative, KubeFlow, etc.)

Organizersโ€‹

Technical Community Groups are led by community organizers who facilitate meetings, coordinate activities, and ensure the group delivers value to participants.

To become an organizer or learn more about current organizers, check the group's page on community.cncf.io or reach out via the CNCF Slack.

Resourcesโ€‹

CNCF Resourcesโ€‹

External Resourcesโ€‹

Evolution and Futureโ€‹

As a Technical Community Group, the AI TCG may evolve based on community needs:

  • Current State: Discussion and knowledge-sharing forum
  • Near Term: Deliver initiatives like best practices guides or reference architectures
  • Long Term: Potentially evolve into a TAG if sustained value and formal oversight is needed

The community drives the direction and pace of this evolution.

Contactโ€‹

For questions or to get involved:

Code of Conductโ€‹

All participants in the Artificial Intelligence TCG must adhere to the CNCF Code of Conduct. We are committed to providing a welcoming and inclusive environment for all community members.