pyvideo · ELC · Jun 19, 2025 · Jun 24, 2025 · Jun 24, 2025
diff --git a/pydata-yerevan-2023/category.json b/pydata-yerevan-2023/category.json
@@ -0,0 +1,3 @@
+{
+  "title": "PyData Yerevan 2023"
+}
diff --git a/...ta-yerevan-2023/videos/adam-kulidjian-crafting-impactful-dashboards-for-your-clients.json b/...ta-yerevan-2023/videos/adam-kulidjian-crafting-impactful-dashboards-for-your-clients.json
@@ -0,0 +1,24 @@
+{
+  "description": "Adam Kulidjian, Chief Technology Officer at Zyphr Solutions Inc., provides a talk on \u201cCrafting Impactful Dashboards for Your Clients.\u201d\n\nCommunicating trends, patterns, and insights through data is integral to understanding the world quantitatively. This phenomenon is used in data science, business intelligence, data analytics, and generally across all scientific disciplines.\n\nA dashboard, particularly one that houses data visualization, is the most common way to do it. With the increased accessibility of dashboard creation tools to people using data, there is a need to effectively communicate data, tell compelling stories, and create affordances that allow others to explore the data themselves.\n\nThe talk offers a handful of heuristics and pragmatic questions that will help you build a better dashboard, regardless of your clients, industry, or use case.\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 3089,
+  "language": "eng",
+  "recorded": "2024-05-22",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    }
+  ],
+  "speakers": [
+    "Adam Kulidjian"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/DZjCrLJ1xlk/maxresdefault.jpg",
+  "title": "Crafting Impactful Dashboards for Your Clients",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=DZjCrLJ1xlk"
+    }
+  ]
+}
diff --git a/pydata-yerevan-2023/videos/aleksandr-sarachakov-revolutionizing-cancer-treatment.json b/pydata-yerevan-2023/videos/aleksandr-sarachakov-revolutionizing-cancer-treatment.json
@@ -0,0 +1,24 @@
+{
+  "description": "Aleksandr Sarachakov, Biomedical Imaging Team Lead at BostonGene, provides a talk on  \u201cRevolutionizing Cancer Treatment: Harnessing AI, Zarr, and AnnData for High-Speed Biomedical Imaging.\u201d \n\nZarr and AnnData, Python-based technologies, are revolutionizing the landscape of biomedical image processing, especially when paired with self-supervised learning (SSL).  Zarr, a chunked and compressed data storage format, enables the efficient handling of datasets found in biomedical applications. AnnData, a specialized framework for multi-dimensional annotated data, is crucial in managing and analyzing large-scale biomedical datasets.\n\nIn the context of SSL, these technologies boost the processing speed and reduce the computational load for handling high-resolution images and complex datasets. Zarr's ability to store multi-terabyte data in distributed and parallelized environments allows for faster processing and analysis of biomedical images. AnnData complements this by providing structured, annotated data that SSL models can efficiently learn from without extensive labeling. This combination reduces memory usage, making it feasible to handle biomedical images on a large scale. These advancements are pivotal for applications like cancer diagnosis, where rapid, accurate image analysis is critical.\n\nDuring the talk, our speaker explores:\n- how Zarr and AnnData facilitate scalable biomedical image processing, \n- outline their integration with SSL for cutting-edge research, \n- and discuss future developments in optimizing biomedical workflows.\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 6191,
+  "language": "eng",
+  "recorded": "2024-11-07",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    }
+  ],
+  "speakers": [
+    "Aleksandr Sarachakov"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/Xik80kYLD5c/maxresdefault.jpg",
+  "title": "Revolutionizing Cancer Treatment: Harnessing AI, Zarr, and AnnData for High-Speed Biomedical Imaging",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=Xik80kYLD5c"
+    }
+  ]
+}
diff --git a/pydata-yerevan-2023/videos/aleksei-gorin-machine-learning-approaches-in-neuroscience.json b/pydata-yerevan-2023/videos/aleksei-gorin-machine-learning-approaches-in-neuroscience.json
@@ -0,0 +1,24 @@
+{
+  "description": "Dr. Aleksei Gorin, a Neurobiologist and Senior Scientist at Emonomy, provides a talk on \u201cMachine Learning Approaches in Neuroscience.\u201d\n\nAlong with the growth of artificial intelligence and machine learning methodologies, neurobiologists are adopting modern machine learning techniques to tackle a broad spectrum of challenges. Those range from early disease diagnosis to the development of software capable of modeling behavior and natural neural networks.\n\nDuring the talk, Dr. Gorin explores the latest endeavors for integrating machine learning into neuroscience while:\n-discussing the achieved outcomes and their implications for the evolution of brain science methodologies,\n-examining key libraries in computational neuroscience, their role, and offering solutions in data analysis processes.\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 3697,
+  "language": "eng",
+  "recorded": "2024-02-28",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    }
+  ],
+  "speakers": [
+    "Aleksei Gorin"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/i-8IeS9N7wA/maxresdefault.jpg",
+  "title": "Machine Learning Approaches in Neuroscience",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=i-8IeS9N7wA"
+    }
+  ]
+}
diff --git a/pydata-yerevan-2023/videos/egor-romanov-performance-of-vector-databases.json b/pydata-yerevan-2023/videos/egor-romanov-performance-of-vector-databases.json
@@ -0,0 +1,28 @@
+{
+  "description": "Egor Romanov, Software Engineer at Supabase, provides a talk on \u201cPerformance of Vector Databases.\u201d\n\nThe talk delves into vector databases' performance, challenges, and potentialities and discovers their role in advancing AI applications like Retrieval-Augmented Generation (RAG).\n\nHigh-dimensional embeddings are integral to numerous machine learning applications, transforming raw data into compact representations for diverse algorithms. Vector databases are essential in managing and utilizing these vectors. \n\nTheir main purpose includes aiding operations, including distance computations, similarity evaluations, and nearest-neighbor searches within high-dimensional spaces. RAG leverages these embedding stores, unlocking significant potentialities in the AI domain. \n\nDuring the talk, Egor Romanov:\n\n- explores the process of creating a provider for Postgres, integrated with pgvector, within a Python performance evaluation framework,\n- conducts a similarity search test simulation showcasing the latent performance potential.\n\nAccess the talk notes at: https://shorturl.at/CGKL1\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 2588,
+  "language": "eng",
+  "recorded": "2024-01-22",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    },
+    {
+      "label": "https://shorturl.at/CGKL1",
+      "url": "https://shorturl.at/CGKL1"
+    }
+  ],
+  "speakers": [
+    "Egor Romanov"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/-MYYB0QjV6I/maxresdefault.jpg",
+  "title": "Performance of Vector Databases",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=-MYYB0QjV6I"
+    }
+  ]
+}
diff --git a/...23/videos/gabor-szarnyas-duckdb-the-power-of-a-data-warehouse-in-your-python-process.json b/...23/videos/gabor-szarnyas-duckdb-the-power-of-a-data-warehouse-in-your-python-process.json
@@ -0,0 +1,24 @@
+{
+  "description": "G\u00e1bor Sz\u00e1rnyas, a Developer Relations Advocate and Technical Writer at DuckDB Labs, provides a talk on \u201cDuckDB: The Power of a Data Warehouse in your Python Process.\u201d\n\nDuckDB is an in-process analytical database management system, a powerful data warehouse engine running inside the Python process without any setup or communication overhead.\n\nIt is an open-source and highly portable system available as a command line tool with R, NodeJS, and Julia clients;\n- which loads data from many formats, such as CSV and Parquet, as well as pandas data frames,\n-its speed and features allow it to tackle a remarkable number of use cases in data science \u2013 including data wrangling and running complex ad-hoc SQL queries \u2013 while running on a laptop.\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 3326,
+  "language": "eng",
+  "recorded": "2023-10-23",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    }
+  ],
+  "speakers": [
+    "Gábor Szárnyas"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/q_SKaOeRiOI/maxresdefault.jpg",
+  "title": "DuckDB: The Power of a Data Warehouse in your Python Process",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=q_SKaOeRiOI"
+    }
+  ]
+}
diff --git a/...rapetyan-karen-javadyan-langchain-a-framework-for-building-large-language-model-apps.json b/...rapetyan-karen-javadyan-langchain-a-framework-for-building-large-language-model-apps.json
@@ -0,0 +1,33 @@
+{
+  "description": "Gor Hayrapetyan, Lead/Senior Data Engineer at Microsoft, Estonia, and Karen Javadyan, Software Engineer at Snowflake, provide a talk on \u201cLangchain: A Framework for Building Large Language Model Apps.\u201d\n\nThe landscape of Large Language Models (LLM) and the libraries supporting them has recently had rapid evolution.\n\nDuring the talk, you will get a brief introduction to LLMs and learn about the current framework of LLM applications. Following this, you will discover Langchain features and concepts, including:\n- Integrations with different LLM models,\n- Chains,\n- Retrievers, \n- Tools,\n- Agents.\n\nTo put Langchain usage into perspective, the talk will also reflect on the RAG technique to expose LLM to your data.\n\nGitHub Repo: https://github.com/kajarenc/PyData-March-Langchain\n\nSlides: https://shorturl.at/ciovF\n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 4235,
+  "language": "eng",
+  "recorded": "2024-04-12",
+  "related_urls": [
+    {
+      "label": "https://shorturl.at/ciovF",
+      "url": "https://shorturl.at/ciovF"
+    },
+    {
+      "label": "https://github.com/kajarenc/PyData-March-Langchain",
+      "url": "https://github.com/kajarenc/PyData-March-Langchain"
+    },
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    }
+  ],
+  "speakers": [
+    "Gor Hayrapetyan",
+    "Karen Javadyan"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/YNixBsPt7Ds/maxresdefault.jpg",
+  "title": "Langchain: A Framework for Building Large Language Model Apps",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=YNixBsPt7Ds"
+    }
+  ]
+}
diff --git a/...moshkov-daria-gitman-how-to-build-an-llm-for-math-reasoning-without-proprietary-data.json b/...moshkov-daria-gitman-how-to-build-an-llm-for-math-reasoning-without-proprietary-data.json
@@ -0,0 +1,29 @@
+{
+  "description": "Ivan Moshkov, Deep Learning Engineer at NVIDIA, and Daria Gitman, Conversational AI Research Intern at NVIDIA provide a talk on \"How to Build an LLM for Math Reasoning without Proprietary Data?\"\n\nRecent research has shown the value of synthetically generated datasets in training LLMs to acquire targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA and MAmmoTH rely on outputs from closed-source LLMs that have commercially restrictive licenses. One key reason limiting the use of open-source LLMs in data generation pipelines is the gap in the mathematical skills between the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. \n\nIn their research, Ivan and Daria constructed OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs using recent progress in open-source LLMs, proposed prompting novelty, and brute-force scaling. Their best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a competitive score of 84.6% on GSM8K and 50.7% on MATH, comparable to top GPT-distilled models. \n\nDuring the talk, Ivan introduces the challenge of math reasoning in Natural Language Processing and discusses the process of creating their synthetic dataset. Following this, Daria explores the Data Explorer tool and shares key insights extracted from the data using this tool. \n\nSlides: https://shorturl.at/GRUFi \n-\nwww.pydata.org\n\nPyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. \n\nPyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.\n\n00:00 Welcome!\n00:10 Help us add time stamps or captions to this video! See the description for details.\n\nWant to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps",
+  "duration": 3671,
+  "language": "eng",
+  "recorded": "2024-07-24",
+  "related_urls": [
+    {
+      "label": "https://github.com/numfocus/YouTubeVideoTimestamps",
+      "url": "https://github.com/numfocus/YouTubeVideoTimestamps"
+    },
+    {
+      "label": "https://shorturl.at/GRUFi",
+      "url": "https://shorturl.at/GRUFi"
+    }
+  ],
+  "speakers": [
+    "Ivan Moshkov",
+    "Daria Gitman"
+  ],
+  "tags": [],
+  "thumbnail_url": "https://i.ytimg.com/vi/prPLAxYF1bU/maxresdefault.jpg",
+  "title": "How to Build an LLM for Math Reasoning without Proprietary Data?",
+  "videos": [
+    {
+      "type": "youtube",
+      "url": "https://www.youtube.com/watch?v=prPLAxYF1bU"
+    }
+  ]
+}