Skip to content

Harnessing the Power of Multi-GPU Training with PyTorch Distributed Data Parallel (DDP) #191

@abhijeet-dhumal

Description

@abhijeet-dhumal

Title of the talk

Multi-GPU ML training using Pytorch-DDP

Description

As the scale and complexity of deep learning models continue to grow, efficient training strategies have become crucial for accelerating innovation and pushing the boundaries of AI research and deployment. Multi-GPU training has emerged as a game-changer, enabling faster model convergence and the ability to handle larger datasets and models. Among the various approaches available, PyTorch’s Distributed Data Parallel (DDP) stands out as a powerful and efficient solution designed for scalability and performance.

Table of contents

Topics of interest include, but are not limited to:

  • Introduction to Pytorch DDP (Distributed Data Parallel)
  • Best practices for setting up and using PyTorch DDP for multi-GPU training.
  • Practical demo on training a simple neural-network with the MNIST datasets using PyTorch DDP.

Duration (including Q&A)

25 mins

Prerequisites

Nothing is required

Speaker bio

  • Amita Sharma (Red hat Openshift AI - Kubeflow Training Team : Technical Project Manager)
  • Abhijeet Dhumal (Red hat Openshift AI - Kubeflow Training Team : Engineer) @abhijeet-dhumal

The talk/workshop speaker agrees to

Metadata

Metadata

Assignees

No one assigned

    Labels

    scheduledThe scheduled talkstalk-proposalNew talk of Python Pune meetup

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions