From 0aaff11a75af93822af0c0db3a8fc1d6b8091dd8 Mon Sep 17 00:00:00 2001 From: Timna Brown <24630902+brown9804@users.noreply.github.com> Date: Mon, 28 Apr 2025 22:48:15 -0600 Subject: [PATCH 1/5] Fabric+ai_demostration --- msFabric-AI_integration/README.md | 403 ++++++++++++++++++++++++++++++ 1 file changed, 403 insertions(+) create mode 100644 msFabric-AI_integration/README.md diff --git a/msFabric-AI_integration/README.md b/msFabric-AI_integration/README.md new file mode 100644 index 0000000..59099bc --- /dev/null +++ b/msFabric-AI_integration/README.md @@ -0,0 +1,403 @@ +# Demostration: How to integrate AI in Microsoft Fabric + +Costa Rica + +[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/) +[brown9804](https://github.com/brown9804) + +Last updated: 2025-04-21 + +------------------------------------------ + +> Fabric's OneLake datastore provides a unified data storage solution that supports differents data formats and sources. This feature simplifies data access and management, enabling efficient data preparation and model training. + +
+List of References (Click to expand) + +- [Unleashing the Power of Microsoft Fabric and SynapseML](https://blog.fabric.microsoft.com/en-us/blog/unleashing-the-power-of-synapseml-and-microsoft-fabric-a-guide-to-qa-on-pdf-documents-2) +- [Building a RAG application with Microsoft Fabric](https://techcommunity.microsoft.com/t5/startups-at-microsoft/building-high-scale-rag-applications-with-microsoft-fabric/ba-p/4217816) +- [Building Custom AI Applications with Microsoft Fabric: Implementing Retrieval-Augmented Generation](https://support.fabric.microsoft.com/en-us/blog/building-custom-ai-applications-with-microsoft-fabric-implementing-retrieval-augmented-generation-for-enhanced-language-models?ft=Alicia%20Li%20%28ASA%29:author) +- [Avail the Power of Microsoft Fabric from within Azure Machine Learning](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/avail-the-power-of-microsoft-fabric-from-within-azure-machine/ba-p/3980702) +- [AI and Machine Learning on Databricks - Azure Databricks | Microsoft Learn]( https://learn.microsoft.com/en-us/azure/databricks/machine-learning) +- [Training and Inference of LLMs with PyTorch Fully Sharded Data Parallel](https://techcommunity.microsoft.com/t5/microsoft-developer-community/training-and-inference-of-llms-with-pytorch-fully-sharded-data/ba-p/3845995) +- [Harness the Power of LangChain in Microsoft Fabric for Advanced Document Summarization](https://blog.fabric.microsoft.com/en-us/blog/harness-the-power-of-langchain-in-microsoft-fabric-for-advanced-document-summarization) +- [Integrating Azure AI and Microsoft Fabric for Next-Gen AI Solutions](https://build.microsoft.com/en-US/sessions/91971ab3-93e4-429d-b2d7-5b60b2729b72) +- [Generative AI with Microsoft Fabric](https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/generative-ai-with-microsoft-fabric/ba-p/4219444) +- [Harness Microsoft Fabric AI Skill to Unlock Context-Rich Insights from Your Data](https://blog.fabric.microsoft.com/en-us/blog/harness-microsoft-fabric-ai-skill-to-unlock-context-rich-insights-from-your-data) +- [LangChain-AzureOpenAI Parameter API Reference](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.azure.AzureChatOpenAI.html#) + +
+ +
+Table of Content (Click to expand) + +- [Overview](#overview) +- [Demo](#demo) + - [Set Up Your Environment](#set-up-your-environment) + - [Install Required Libraries](#install-required-libraries) + - [Configure Azure OpenAI Service](#configure-azure-openai-service) + - [Basic Usage of LangChain Transformer](#basic-usage-of-langchain-transformer) + - [Using LangChain for Large Scale Literature Review](#using-langchain-for-large-scale-literature-review) + - [Machine Learning Integration with Microsoft Fabric](#machine-learning-integration-with-microsoft-fabric) + +
+ +## Overview + +> Microsoft Fabric is a comprehensive data analytics platform that brings together various data services to provide an end-to-end solution for data engineering, data science, data warehousing, real-time analytics, and business intelligence. It's designed to simplify the process of working with data and to enable organizations to gain insights more efficiently.

+> Capabilities Enabled by LLMs:
+> +> - `Document Summarization`: LLMs can process and summarize large documents, making it easier to extract key information.
+> - `Question Answering:` Users can perform Q&A tasks on PDF documents, allowing for interactive data exploration.
+> - `Embedding Generation`: LLMs can generate embeddings for document chunks, which can be stored in a vector store for efficient search and retrieval. + +## Demo + +Tools in practice: + +| **Tool** | **Description**| +|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **LangChain**| LangChain is a framework for developing applications powered by language models. It can be used with Azure OpenAI to build applications that require natural language understanding and generation.
**Use Case**: Creating complex applications that involve multiple steps or stages of processing, such as preprocessing text data, applying a language model, and postprocessing the results. | +| **SynapseML**| SynapseML is an open-source library that simplifies the creation of massively scalable machine learning pipelines. It integrates with Azure OpenAI to provide distributed computing capabilities, allowing you to apply large language models at scale.
**Use Case**: Applying powerful language models to massive amounts of data, enabling scenarios like batch processing of text data or large-scale text analytics. | + +### Set Up Your Environment + +1. **Register the Resource Provider**: Ensure that the `microsoft.fabric` resource provider is registered in your subscription. + + image + +2. **Create a Microsoft Fabric Resource**: + - Navigate to the Azure Portal. + - Create a new resource of type **Microsoft Fabric**. + - Choose the appropriate subscription, resource group, capacity name, region, size, and administrator. + + image + +3. **Enable Fabric Capacity in Power BI**: + - Go to the Power BI workspace. + - Select the Fabric capacity license and the Fabric resource created in Azure. + + image + +4. **Pause Fabric Compute When Not in Use**: To save costs, remember to pause the Fabric compute in Azure when you're not using it. + + image + +### Install Required Libraries + +1. **Access Microsoft Fabric**: + - Open your web browser and navigate to the Microsoft Fabric portal. + - Sign in with your Azure credentials. +2. **Select Your Workspace**: From the Microsoft Fabric home page, select the workspace where you want to configure SynapseML. +3. **Create a New Cluster**: + - Within the **Data Science** component, you should find options to create a new cluster. + + image + + - Follow the prompts to configure and create your cluster, specifying the details such as cluster name, region, node size, and node count. + + image + + image + +4. **Install SynapseML on Your Cluster**: Configure your cluster to include the SynapseML package. + + image + + image + + ~~~ + %pip show synapseml + ~~~ + +5. **Install LangChain and Other Dependencies**: + > You can use `%pip install` to install the necessary packages + + ```python + %pip install openai langchain_community + ``` + + Or you can use the environment configuration: + + image + + You can also try with the `.yml file` approach. Just upload your list of dependencies. E.g: + + ```yml + dependencies: + - pip: + - synapseml==1.0.8 + - langchain==0.3.4 + - langchain_community==0.3.4 + - openai==1.53.0 + - langchain.openai==0.2.4 + ``` + +### Configure Azure OpenAI Service + +> [!NOTE] +> Click [here](./src/fabric-llms-overview_sample.ipynb) to see all notebook + +1. **Set Up API Keys**: Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables + + image + + ```python + import os + + # Set the API version for the Azure OpenAI service + os.environ["OPENAI_API_VERSION"] = "2023-08-01-preview" + + # Set the base URL for the Azure OpenAI service + os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource-name.openai.azure.com" + + # Set the API key for Azure OpenAI + os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key" + ``` + +2. **Initialize Azure OpenAI Class**: Create an instance of the Azure OpenAI class using the environment variables set above. + + image + + ```python + from langchain_openai import AzureChatOpenAI + + # Set the API base URL + api_base = os.environ["AZURE_OPENAI_ENDPOINT"] + + # Create an instance of the Azure OpenAI Class + llm = AzureChatOpenAI( + openai_api_key=os.environ["AZURE_OPENAI_API_KEY"], + temperature=0.7, + verbose=True, + top_p=0.9 + ) + ``` + +3. **Call the Deployed Model**: Use the Azure OpenAI service to generate text or perform other language model tasks. Here's an example of generating a response based on a prompt + + image + + ```python + # Define a prompt + messages = [ + ( + "system", + "You are a helpful assistant that translates English to French. Translate the user sentence.", + ), + ("human", "Hi, how are you?"), + ] + + # Generate a response from the Azure OpenAI service using the invoke method + ai_msg = llm.invoke(messages) + + # Print the response + print(ai_msg) + ``` + +Make sure to replace `"your_openai_api_key"`, `"https://your_openai_api_base/"`, `"your_deployment_name"`, and `"your_model_name"` with your actual API key, base URL, deployment name, and model name from your Azure OpenAI instance. This example demonstrates how to configure and use an existing Azure OpenAI instance in Microsoft Fabric. + +### Basic Usage of LangChain Transformer + +> [!NOTE] +> E.g: Automate the process of generating definitions for technology terms using a language model. +> `The LangChain Transformer` is a tool that makes it easy to use advanced language models for `generating and transforming text`. It works by `setting up a template for what you want to create, linking this template to a language model, and then processing your data to produce the desired output`. This setup `helps automate tasks like defining technology terms or generating other text-based content`, making your workflow smoother and more efficient. + +> `LangChain Transformer helps you automate the process of generating and transforming text data using advanced language models`, making it easier to integrate AI capabilities into your data workflows.
+> +> 1. `Prompt Creation`: Start by `defining a template for the kind of text you want to generate or analyze`. For example, you might create a prompt that asks the model to define a specific technology term.
+> 2. `Chain Setup`: Then `set up a chain that links this prompt to a language model`. This chain is responsible for sending the prompt to the model and receiving the generated response.
+> 3. `Transformer Configuration`: The LangChain Transformer is `configured to use this chain`. It specifies how the `input data (like a list of technology names) should be processed and what kind of output (like definitions) should be produced`.
+> 4. `Data Processing`: Finally, `apply this setup to a dataset.` E.g., list of technology names in a DataFrame, and the transformer will use the language model to generate definitions for each technology. + +1. **Create a Prompt Template**: Define a prompt template for generating definitions. + + image + + ```python + from langchain.prompts import PromptTemplate + + copy_prompt = PromptTemplate( + input_variables=["technology"], + template="Define the following word: {technology}", + ) + ``` + +2. **Set Up an LLMChain**: Create an LLMChain with the defined prompt template. + + image + + ```python + from langchain.chains import LLMChain + + chain = LLMChain(llm=llm, prompt=copy_prompt) + ``` + +3. **Configure LangChain Transformer**: Set up the LangChain transformer to execute the processing chain. + + image + + ```python + # Set up the LangChain transformer to execute the processing chain. + from synapse.ml.cognitive.langchain import LangchainTransformer + + openai_api_key= os.environ["AZURE_OPENAI_API_KEY"] + + transformer = ( + LangchainTransformer() + .setInputCol("technology") + .setOutputCol("definition") + .setChain(chain) + .setSubscriptionKey(openai_api_key) + .setUrl(api_base) + ) + ``` + +4. **Create a Test DataFrame**: Construct a DataFrame with technology names. + + image + + ```python + from pyspark.sql import SparkSession + from pyspark.sql.functions import udf + from pyspark.sql.types import StringType + + # Initialize Spark session + spark = SparkSession.builder.appName("example").getOrCreate() + + # Construct a DataFrame with technology names + df = spark.createDataFrame( + [ + (0, "docker"), (1, "spark"), (2, "python") + ], + ["label", "technology"] + ) + + # Define a simple UDF to transform the technology column + def transform_technology(tech): + return tech.upper() + + # Register the UDF + transform_udf = udf(transform_technology, StringType()) + + # Apply the UDF to the DataFrame + transformed_df = df.withColumn("transformed_technology", transform_udf(df["technology"])) + + # Show the transformed DataFrame + transformed_df.show() + ``` + +### Using LangChain for Large Scale Literature Review + +> [!NOTE] +> E.g: Automating the extraction and summarization of academic papers: script for an agent using LangChain to extract content from an online PDF and generate a prompt based on that content. +> An `agent` in the context of programming and artificial intelligence is a `software entity that performs tasks autonomously`. It can interact with its`environment, make decisions, and execute actions based on predefined rules or learned behavior.` + +1. **Define Functions for Content Extraction and Prompt Generation**: Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information. + + image + + ```python + from langchain.document_loaders import OnlinePDFLoader + + def paper_content_extraction(inputs: dict) -> dict: + arxiv_link = inputs["arxiv_link"] + loader = OnlinePDFLoader(arxiv_link) + pages = loader.load_and_split() + return {"paper_content": pages[0].page_content + pages[1].page_content} + + def prompt_generation(inputs: dict) -> dict: + output = inputs["Output"] + prompt = ( + "find the paper title, author, summary in the paper description below, output them. " + "After that, Use websearch to find out 3 recent papers of the first author in the author section below " + "(first author is the first name separated by comma) and list the paper titles in bullet points: " + "\n" + output + "." + ) + return {"prompt": prompt} + ``` + +2. **Create a Sequential Chain for Information Extraction**: Set up a chain to extract structured information from an arXiv link + + image + + ```python + from langchain.chains import TransformChain, SimpleSequentialChain + + paper_content_extraction_chain = TransformChain( + input_variables=["arxiv_link"], + output_variables=["paper_content"], + transform=paper_content_extraction, + verbose=False, + ) + + paper_summarizer_template = """ + You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary, + and extract authors and paper title from the paper content. + """ + ``` + +### Machine Learning Integration with Microsoft Fabric + +1. **Train and Register Machine Learning Models**: Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics. + + image + + ```python + import mlflow + from mlflow.models import infer_signature + from sklearn.datasets import make_regression + from sklearn.ensemble import RandomForestRegressor + + # Generate synthetic regression data + X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) + + # Model parameters + params = {"n_estimators": 3, "random_state": 42} + + # Model tags for MLflow + model_tags = { + "project_name": "grocery-forecasting", + "store_dept": "produce", + "team": "stores-ml", + "project_quarter": "Q3-2023" + } + + # Log MLflow entities + with mlflow.start_run() as run: + # Train the model + model = RandomForestRegressor(**params).fit(X, y) + + # Infer the model signature + signature = infer_signature(X, model.predict(X)) + + # Log parameters and the model + mlflow.log_params(params) + mlflow.sklearn.log_model(model, artifact_path="sklearn-model", signature=signature) + + # Register the model with tags + model_uri = f"runs:/{run.info.run_id}/sklearn-model" + model_version = mlflow.register_model(model_uri, "RandomForestRegressionModel", tags=model_tags) + + # Output model registration details + print(f"Model Name: {model_version.name}") + print(f"Model Version: {model_version.version}") + ``` + +2. **Compare and Filter Machine Learning Models**: Use MLflow to search among multiple models saved within the workspace. + + image + + ```python + from pprint import pprint + from mlflow.tracking import MlflowClient + + client = MlflowClient() + for rm in client.search_registered_models(): + pprint(dict(rm), indent=4) + ``` + +
+

Total Visitors

+ Visitor Count +
From e6ea1e87193de184b98fd2a3a337adc70267b99b Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 29 Apr 2025 04:48:45 +0000 Subject: [PATCH 2/5] Update last modified date in Markdown files --- msFabric-AI_integration/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/msFabric-AI_integration/README.md b/msFabric-AI_integration/README.md index 59099bc..e5e93ea 100644 --- a/msFabric-AI_integration/README.md +++ b/msFabric-AI_integration/README.md @@ -5,7 +5,7 @@ Costa Rica [![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/) [brown9804](https://github.com/brown9804) -Last updated: 2025-04-21 +Last updated: 2025-04-29 ------------------------------------------ From 2535ec26fff5dbe85bef93212344eae523bd0d6d Mon Sep 17 00:00:00 2001 From: Timna Brown <24630902+brown9804@users.noreply.github.com> Date: Mon, 28 Apr 2025 22:52:41 -0600 Subject: [PATCH 3/5] + format --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 18e4f0d..278e35f 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ Last updated: 2025-04-29 ------------------------------------------ +> Azure Machine Learning (PaaS) is a cloud-based platform from Microsoft designed to help `data scientists and machine learning engineers build, train, deploy, and manage machine learning models at scale`. It supports the `entire machine learning lifecycle, from data preparation and experimentation to deployment and monitoring.` It provides powerful tools for `both code-first and low-code users`, including Jupyter notebooks, drag-and-drop interfaces, and automated machine learning (AutoML). `Azure ML integrates seamlessly with other Azure services and supports popular frameworks like TensorFlow, PyTorch, and Scikit-learn.`
List of References (Click to expand) @@ -30,8 +31,10 @@ Last updated: 2025-04-29
-> Azure Machine Learning (PaaS) is a cloud-based platform from Microsoft designed to help `data scientists and machine learning engineers build, train, deploy, and manage machine learning models at scale`. It supports the `entire machine learning lifecycle, from data preparation and experimentation to deployment and monitoring.` It provides powerful tools for `both code-first and low-code users`, including Jupyter notebooks, drag-and-drop interfaces, and automated machine learning (AutoML). `Azure ML integrates seamlessly with other Azure services and supports popular frameworks like TensorFlow, PyTorch, and Scikit-learn.` +> Topics: +- [Demostration: How to integrate AI in Microsoft Fabric](./msFabric-AI_integration/) + https://github.com/user-attachments/assets/c199156f-96cf-4ed0-a8b5-c88db3e7a552 | Feature / Platform | Azure Machine Learning | Microsoft Fabric | Azure AI Foundry | From b619f548da206f2d8d019ac81123373d380cea47 Mon Sep 17 00:00:00 2001 From: Timna Brown <24630902+brown9804@users.noreply.github.com> Date: Mon, 28 Apr 2025 22:53:54 -0600 Subject: [PATCH 4/5] + fabric-llms-overview_sample.ipynb --- .../src/fabric-llms-overview_sample.ipynb | 1202 +++++++++++++++++ 1 file changed, 1202 insertions(+) create mode 100644 msFabric-AI_integration/src/fabric-llms-overview_sample.ipynb diff --git a/msFabric-AI_integration/src/fabric-llms-overview_sample.ipynb b/msFabric-AI_integration/src/fabric-llms-overview_sample.ipynb new file mode 100644 index 0000000..d6563cf --- /dev/null +++ b/msFabric-AI_integration/src/fabric-llms-overview_sample.ipynb @@ -0,0 +1,1202 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "519955e9-2dad-456d-93db-a332d38e9433", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "# Demostration: How to integrate AI in Microsoft Fabric" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d312e8d9-03fe-4b3d-aa6d-c52e3022ae39", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T03:58:26.7170509Z", + "execution_start_time": "2024-10-31T03:58:19.270951Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "e267b6ab-5133-4598-8251-d64374cd11e5", + "queued_time": "2024-10-31T03:58:18.9132075Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 5, + "statement_ids": [ + 5 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 5, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Name: synapseml\r\n", + "Version: 1.0.8\r\n", + "Summary: Synapse Machine Learning\r\n", + "Home-page: https://github.com/Microsoft/SynapseML\r\n", + "Author: Microsoft\r\n", + "Author-email: synapseml-support@microsoft.com\r\n", + "License: MIT\r\n", + "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n", + "Requires: \r\n", + "Required-by: \r\n" + ] + } + ], + "source": [ + "!pip show synapseml" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "427610d0-3fae-45e3-8150-92ee7674f44c", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T03:58:28.6254349Z", + "execution_start_time": "2024-10-31T03:58:27.1124616Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "0e9f6c0f-062b-4e5d-9061-afcd89c8fd75", + "queued_time": "2024-10-31T03:58:19.3223486Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 6, + "statement_ids": [ + 6 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 6, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Name: langchain-openai\r\n", + "Version: 0.2.4\r\n", + "Summary: An integration package connecting OpenAI and LangChain\r\n", + "Home-page: https://github.com/langchain-ai/langchain\r\n", + "Author: \r\n", + "Author-email: \r\n", + "License: MIT\r\n", + "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n", + "Requires: langchain-core, openai, tiktoken\r\n", + "Required-by: \r\n" + ] + } + ], + "source": [ + "!pip show langchain-openai" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "baeeb853-2104-4edf-abf4-4d4be50cb977", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T03:58:30.5465258Z", + "execution_start_time": "2024-10-31T03:58:29.0000586Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "716d9975-263b-4d92-b25c-b342106f5f43", + "queued_time": "2024-10-31T03:58:19.511824Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 7, + "statement_ids": [ + 7 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 7, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Name: langchain\r\n", + "Version: 0.3.6\r\n", + "Summary: Building applications with LLMs through composability\r\n", + "Home-page: https://github.com/langchain-ai/langchain\r\n", + "Author: \r\n", + "Author-email: \r\n", + "License: MIT\r\n", + "Location: /home/trusted-service-user/cluster-env/clonedenv/lib/python3.11/site-packages\r\n", + "Requires: aiohttp, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity\r\n", + "Required-by: langchain-community\r\n" + ] + } + ], + "source": [ + "!pip show langchain" + ] + }, + { + "cell_type": "markdown", + "id": "c58cc406-c4f5-4607-a740-0802e8e4b550", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Ensure you have the API key and endpoint URL for your deployed model. Set these as environment variables" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3c8ada7c-2632-4c69-86d2-f5260ee8f1b7", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:14.3495341Z", + "execution_start_time": "2024-10-31T04:20:14.1128215Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "2573bf75-fe6d-40dc-b9f6-e06ebb9f7f73", + "queued_time": "2024-10-31T04:20:13.6194485Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 22, + "statement_ids": [ + 22 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 22, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import os\n", + "\n", + "os.environ[\"OPENAI_API_VERSION\"] = \"2023-08-01-preview\"\n", + "os.environ[\"AZURE_OPENAI_ENDPOINT\"] = \"https://your-resource.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-08-01-preview\"\n", + "os.environ[\"AZURE_OPENAI_API_KEY\"] = \"your-value\"" + ] + }, + { + "cell_type": "markdown", + "id": "3fac48a9-45fb-4e86-9792-8ee340b0ac60", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create an instance of the Azure OpenAI class using the environment variables set above" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "5db10350-8000-4cbd-9bdf-d7da62d7fe61", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:14.9382032Z", + "execution_start_time": "2024-10-31T04:20:14.7083469Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "7dfaca5a-f738-4010-bba1-f764ea70f450", + "queued_time": "2024-10-31T04:20:14.027325Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 23, + "statement_ids": [ + 23 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 23, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from langchain_openai import AzureChatOpenAI\n", + "\n", + "# Set the API base URL\n", + "api_base = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n", + "\n", + "# Create an instance of the Azure OpenAI Class\n", + "llm = AzureChatOpenAI(\n", + " openai_api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " temperature=0.7,\n", + " verbose=True,\n", + " top_p=0.9\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "b17d7450-34b5-4ece-8e20-a77ddcdd93c4", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Use the Azure OpenAI service to generate text or perform other language model tasks" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "cfc5fd62-085a-4eff-9192-696d9f249a8e", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:16.0500538Z", + "execution_start_time": "2024-10-31T04:20:15.2936074Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "e14e4d0b-1fd0-4dac-a07d-6479d6536ce3", + "queued_time": "2024-10-31T04:20:14.4969185Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 24, + "statement_ids": [ + 24 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 24, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "content='Salut, comment ça va ?' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 33, 'total_tokens': 39, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': 'fp_d54531d9eb', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}} id='run-8cb7f29a-44c1-4f65-a648-15afb2d793dc-0' usage_metadata={'input_tokens': 33, 'output_tokens': 6, 'total_tokens': 39, 'input_token_details': {}, 'output_token_details': {}}\n" + ] + } + ], + "source": [ + "# Define a prompt\n", + "messages = [\n", + " (\n", + " \"system\",\n", + " \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n", + " ),\n", + " (\"human\", \"Hi, how are you?\"),\n", + "]\n", + "\n", + "# Generate a response from the Azure OpenAI service using the invoke method\n", + "ai_msg = llm.invoke(messages)\n", + "\n", + "# Print the response\n", + "print(ai_msg)" + ] + }, + { + "cell_type": "markdown", + "id": "79729106-c7f1-4879-bc2b-871b50c2ac9a", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Define a prompt template for generating definitions" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "ca633361-c27b-4294-b8a7-9fc4a316afa4", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:16.587491Z", + "execution_start_time": "2024-10-31T04:20:16.3655978Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "cc3215f4-71a5-4231-af47-9bd9a8f5698a", + "queued_time": "2024-10-31T04:20:14.7799392Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 25, + "statement_ids": [ + 25 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 25, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from langchain.prompts import PromptTemplate\n", + "\n", + "copy_prompt = PromptTemplate(\n", + " input_variables=[\"technology\"],\n", + " template=\"Define the following word: {technology}\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "899839d9-adca-4042-b662-73edcad7e432", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Create an LLMChain with the defined prompt template" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "bd4f65ca-049b-481d-bbbd-a017c6c0119b", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:17.1233668Z", + "execution_start_time": "2024-10-31T04:20:16.9052959Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "27790d83-509f-4716-bb69-9c288ad069ba", + "queued_time": "2024-10-31T04:20:15.1325692Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 26, + "statement_ids": [ + 26 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 26, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from langchain.chains import LLMChain\n", + "\n", + "chain = LLMChain(llm=llm, prompt=copy_prompt)\n" + ] + }, + { + "cell_type": "markdown", + "id": "936b3ddf-cc65-436c-ba4e-ae0abe21fc2c", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set up the LangChain transformer to execute the processing chain\n" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "63a00038-37b4-49ee-9c53-128c8acf9d01", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:20:18.181457Z", + "execution_start_time": "2024-10-31T04:20:17.4351576Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "3fb30420-f0c9-477b-ad1a-001dc0d8d37a", + "queued_time": "2024-10-31T04:20:15.6799013Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 27, + "statement_ids": [ + 27 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 27, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from synapse.ml.cognitive.langchain import LangchainTransformer\n", + "\n", + "openai_api_key= os.environ[\"AZURE_OPENAI_API_KEY\"]\n", + "\n", + "transformer = (\n", + " LangchainTransformer()\n", + " .setInputCol(\"technology\")\n", + " .setOutputCol(\"definition\")\n", + " .setChain(chain)\n", + " .setSubscriptionKey(openai_api_key)\n", + " .setUrl(api_base)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "c74293f0-925e-4987-a6a1-b3b9b8e14b9d", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Construct a DataFrame with technology names." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "8e03963e-2fcf-4934-b96f-ac27b4e0353c", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:24:08.3891172Z", + "execution_start_time": "2024-10-31T04:24:02.0675933Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "856f5b73-26e8-4d20-a901-356cd92b9c2a", + "queued_time": "2024-10-31T04:24:01.6603792Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 29, + "statement_ids": [ + 29 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 29, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "+-----+----------+----------------------+\n", + "|label|technology|transformed_technology|\n", + "+-----+----------+----------------------+\n", + "| 0| docker| DOCKER|\n", + "| 1| spark| SPARK|\n", + "| 2| python| PYTHON|\n", + "+-----+----------+----------------------+\n", + "\n" + ] + } + ], + "source": [ + "from pyspark.sql import SparkSession\n", + "from pyspark.sql.functions import udf\n", + "from pyspark.sql.types import StringType\n", + "\n", + "# Initialize Spark session\n", + "spark = SparkSession.builder.appName(\"example\").getOrCreate()\n", + "\n", + "# Construct a DataFrame with technology names\n", + "df = spark.createDataFrame(\n", + " [\n", + " (0, \"docker\"), (1, \"spark\"), (2, \"python\")\n", + " ],\n", + " [\"label\", \"technology\"]\n", + ")\n", + "\n", + "# Define a simple UDF to transform the technology column\n", + "def transform_technology(tech):\n", + " return tech.upper()\n", + "\n", + "# Register the UDF\n", + "transform_udf = udf(transform_technology, StringType())\n", + "\n", + "# Apply the UDF to the DataFrame\n", + "transformed_df = df.withColumn(\"transformed_technology\", transform_udf(df[\"technology\"]))\n", + "\n", + "# Show the transformed DataFrame\n", + "transformed_df.show()" + ] + }, + { + "cell_type": "markdown", + "id": "47ab1ba6-deaf-488d-9e95-8202669d948c", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Extract content from PDFs linked in arXiv papers and generate prompts for extracting specific information.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "8b52c87e-5971-4d28-bc4b-4160d29a1c24", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:27:08.3224773Z", + "execution_start_time": "2024-10-31T04:27:08.0430507Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "4eeab690-4159-41dc-be69-3cceed484314", + "queued_time": "2024-10-31T04:27:07.6309068Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 30, + "statement_ids": [ + 30 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 30, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from langchain.document_loaders import OnlinePDFLoader\n", + "\n", + "def paper_content_extraction(inputs: dict) -> dict:\n", + " arxiv_link = inputs[\"arxiv_link\"]\n", + " loader = OnlinePDFLoader(arxiv_link)\n", + " pages = loader.load_and_split()\n", + " return {\"paper_content\": pages[0].page_content + pages[1].page_content}\n", + "\n", + "def prompt_generation(inputs: dict) -> dict:\n", + " output = inputs[\"Output\"]\n", + " prompt = (\n", + " \"find the paper title, author, summary in the paper description below, output them. \"\n", + " \"After that, Use websearch to find out 3 recent papers of the first author in the author section below \"\n", + " \"(first author is the first name separated by comma) and list the paper titles in bullet points: \"\n", + " \"\\n\" + output + \".\"\n", + " )\n", + " return {\"prompt\": prompt}" + ] + }, + { + "cell_type": "markdown", + "id": "89d79c38-ba0c-4062-911c-7ede02536298", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Set up a chain to extract structured information from an arXiv link\n" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "e85241a0-11c2-49c1-9b2e-63187cb24d9a", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:28:11.2331925Z", + "execution_start_time": "2024-10-31T04:28:11.0134852Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "232b4aa0-1b84-47f8-bb5d-347a575d9640", + "queued_time": "2024-10-31T04:28:10.663514Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 31, + "statement_ids": [ + 31 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 31, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from langchain.chains import TransformChain, SimpleSequentialChain\n", + "\n", + "paper_content_extraction_chain = TransformChain(\n", + " input_variables=[\"arxiv_link\"],\n", + " output_variables=[\"paper_content\"],\n", + " transform=paper_content_extraction,\n", + " verbose=False,\n", + ")\n", + "\n", + "paper_summarizer_template = \"\"\"\n", + "You are a paper summarizer, given the paper content, it is your job to summarize the paper into a short summary, \n", + "and extract authors and paper title from the paper content.\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "64937339-791c-4aad-953b-ca990bfd324a", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Use Microsoft Fabric's native integration with the MLflow framework to log the trained machine learning models, the used hyperparameters, and evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "5bac7684-a123-4733-baa3-a748ff0fd070", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-10-31T04:36:54.8917645Z", + "execution_start_time": "2024-10-31T04:36:44.7561664Z", + "livy_statement_state": "available", + "normalized_state": "finished", + "parent_msg_id": "d2abef17-25d7-41c4-a62f-051d9b5fe8d7", + "queued_time": "2024-10-31T04:36:44.2999954Z", + "session_id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 33, + "statement_ids": [ + 33 + ] + }, + "text/plain": [ + "StatementMeta(, 7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325, 33, Finished, Available, Finished)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Registered model 'RandomForestRegressionModel' already exists. Creating a new version of this model...\n", + "2024/10/31 04:36:52 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: RandomForestRegressionModel, version 2\n", + "Created version '2' of model 'RandomForestRegressionModel'.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model Name: RandomForestRegressionModel\n", + "Model Version: 2\n" + ] + }, + { + "data": { + "application/vnd.mlflow.run-widget+json": { + "data": { + "metrics": {}, + "params": { + "n_estimators": "3", + "random_state": "42" + }, + "tags": { + "mlflow.rootRunId": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4", + "mlflow.runName": "icy_hamster_xr34qfzf", + "mlflow.user": "4b3a56ea-6f42-450e-b7c3-fb2932c7ac32", + "synapseml.experiment.artifactId": "17b41ab7-b0e0-4adc-9fc9-403dd72b6e5b", + "synapseml.experimentName": "Notebook-1", + "synapseml.livy.id": "7383b5d4-1dea-4b9b-85d6-fe5ef5b7d325", + "synapseml.notebook.artifactId": "789d5fef-b2a1-409b-996f-0cdb4e748a90", + "synapseml.user.id": "ea5a1fdc-a08c-493a-bce9-8422f28ecd05", + "synapseml.user.name": "System Administrator" + } + }, + "info": { + "artifact_uri": "sds://onelakewestus3.pbidedicated.windows.net/6361aeaa-b63a-44ea-b28f-26db10b31a6c/17b41ab7-b0e0-4adc-9fc9-403dd72b6e5b/20c75f63-d266-40b1-83f7-d9c76fd1f4f4/artifacts", + "end_time": 1730349412, + "experiment_id": "d52403ad-a9c2-41ba-b582-9b8e9a57917e", + "lifecycle_stage": "active", + "run_id": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4", + "run_name": "", + "run_uuid": "20c75f63-d266-40b1-83f7-d9c76fd1f4f4", + "start_time": 1730349405, + "status": "FINISHED", + "user_id": "7ebfac85-3ebb-440f-a743-e52052051f6a" + }, + "inputs": { + "dataset_inputs": [] + } + } + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import mlflow\n", + "from mlflow.models import infer_signature\n", + "from sklearn.datasets import make_regression\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "# Generate synthetic regression data\n", + "X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)\n", + "\n", + "# Model parameters\n", + "params = {\"n_estimators\": 3, \"random_state\": 42}\n", + "\n", + "# Model tags for MLflow\n", + "model_tags = {\n", + " \"project_name\": \"grocery-forecasting\",\n", + " \"store_dept\": \"produce\",\n", + " \"team\": \"stores-ml\",\n", + " \"project_quarter\": \"Q3-2023\"\n", + "}\n", + "\n", + "# Log MLflow entities\n", + "with mlflow.start_run() as run:\n", + " # Train the model\n", + " model = RandomForestRegressor(**params).fit(X, y)\n", + "\n", + " # Infer the model signature\n", + " signature = infer_signature(X, model.predict(X))\n", + "\n", + " # Log parameters and the model\n", + " mlflow.log_params(params)\n", + " mlflow.sklearn.log_model(model, artifact_path=\"sklearn-model\", signature=signature)\n", + "\n", + " # Register the model with tags\n", + " model_uri = f\"runs:/{run.info.run_id}/sklearn-model\"\n", + " model_version = mlflow.register_model(model_uri, \"RandomForestRegressionModel\", tags=model_tags)\n", + "\n", + " # Output model registration details\n", + " print(f\"Model Name: {model_version.name}\")\n", + " print(f\"Model Version: {model_version.version}\")" + ] + }, + { + "cell_type": "markdown", + "id": "315ebdcd-e78c-4bc5-93d6-f202d02bddc5", + "metadata": { + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "## Use MLflow to search among multiple models saved within the workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60e6f7d3-d1ec-4ccc-9745-6c7938d2f4bc", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark" + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [], + "source": [ + "from pprint import pprint\n", + "from mlflow.tracking import MlflowClient\n", + "\n", + "client = MlflowClient()\n", + "for rm in client.search_registered_models():\n", + " pprint(dict(rm), indent=4)" + ] + } + ], + "metadata": { + "application/vnd.jupyter.widget-state+json": { + "version": "1.0" + }, + "dependencies": { + "environment": { + "environmentId": "766562be-9e21-456c-b270-cac7e4bf8d18", + "workspaceId": "6361aeaa-b63a-44ea-b28f-26db10b31a6c" + } + }, + "kernel_info": { + "name": "synapse_pyspark" + }, + "kernelspec": { + "display_name": "Synapse PySpark", + "language": "Python", + "name": "synapse_pyspark" + }, + "language_info": { + "name": "python" + }, + "microsoft": { + "language": "python", + "language_group": "synapse_pyspark", + "ms_spell_check": { + "ms_spell_check_language": "en" + } + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + }, + "spark_compute": { + "compute_id": "/trident/default", + "session_options": { + "conf": { + "spark.synapse.nbs.session.timeout": "1200000" + } + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version": "1.0" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From b3bca562620378a5a73442923a8b781c3b8c6545 Mon Sep 17 00:00:00 2001 From: Timna Brown <24630902+brown9804@users.noreply.github.com> Date: Mon, 28 Apr 2025 22:56:03 -0600 Subject: [PATCH 5/5] ref added --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 278e35f..d626b0b 100644 --- a/README.md +++ b/README.md @@ -37,7 +37,7 @@ Last updated: 2025-04-29 https://github.com/user-attachments/assets/c199156f-96cf-4ed0-a8b5-c88db3e7a552 -| Feature / Platform | Azure Machine Learning | Microsoft Fabric | Azure AI Foundry | +| Feature / Platform | Azure Machine Learning | [Microsoft Fabric](./msFabric-AI_integration) | Azure AI Foundry | |--------------------------|----------------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------| | **Purpose** | End-to-end ML lifecycle management and MLOps | Unified data analytics and business intelligence platform | Unified platform for building and deploying AI solutions | | **Model Deployment** | Supports real-time and batch deployment via AKS, ACI | Limited ML deployment; integrates with Azure ML | Deploys models as APIs or services within projects |