feat: update notebook consistent_imagery_generation.ipynb for Nano Banana GA (#2382)

PicardParis · web-flow · commit 3c76ade9f341 · 2025-10-02T12:52:25.000-04:00
diff --git a/gemini/use-cases/media-generation/consistent_imagery_generation.ipynb b/gemini/use-cases/media-generation/consistent_imagery_generation.ipynb
@@ -110,14 +110,19 @@
     "## 🔥 Challenge\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![intro image](https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/media-generation/consistent_imagery_generation/graph_animated.gif)\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
     "id": "35E-CpC6qoNw"
    },
    "source": [
-    "![intro image](https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/media-generation/consistent_imagery_generation/graph_animated.gif)\n",
-    "\n",
     "We all have existing images worth reusing in different contexts. This would generally imply modifying the images, a complex (if not impossible) task requiring very specific skills and tools. This explains why our archives are full of forgotten or unused treasures. State-of-the-art vision models have evolved so much that we can reconsider this problem.\n",
     "\n",
     "So, can we breathe new life into our visual archives?\n",
@@ -176,7 +181,7 @@
    },
    "outputs": [],
    "source": [
-    "%pip install --quiet \"google-genai>=1.38.0\" \"networkx[default]\""
+    "%pip install --quiet \"google-genai>=1.40.0\" \"networkx[default]\""
    ]
   },
   {
@@ -496,9 +501,9 @@
     "id": "9Ls-wVEq2jhc"
    },
    "source": [
-    "For this challenge, we'll select the latest Gemini 2.5 Flash Image model (currently in preview):\n",
+    "For this challenge, we'll select the latest Gemini 2.5 Flash Image model:\n",
     "\n",
-    "`GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image-preview\"`\n",
+    "`GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image\"`\n",
     "\n",
     "> 💡 \"Gemini 2.5 Flash Image\" is also known as \"Nano Banana\" 🍌\n"
    ]
@@ -534,10 +539,20 @@
     "import IPython.display\n",
     "import tenacity\n",
     "from google.genai.errors import ClientError\n",
-    "from google.genai.types import GenerateContentConfig, PIL_Image\n",
+    "from google.genai.types import GenerateContentConfig, ImageConfig, PIL_Image\n",
+    "\n",
+    "GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image\"\n",
+    "\n",
+    "# You can add the \"TEXT\" modality for potential textual feedback (or in iterative chat mode)\n",
+    "RESPONSE_MODALITIES = [\"IMAGE\"]\n",
+    "\n",
+    "# Supported aspect ratios: \"1:1\", \"2:3\", \"3:2\", \"3:4\", \"4:3\", \"4:5\", \"5:4\", \"9:16\", \"16:9\", and \"21:9\"\n",
+    "ASPECT_RATIO = \"16:9\"\n",
     "\n",
-    "GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image-preview\"\n",
-    "GENERATION_CONFIG = GenerateContentConfig(response_modalities=[\"TEXT\", \"IMAGE\"])\n",
+    "GENERATION_CONFIG = GenerateContentConfig(\n",
+    "    response_modalities=RESPONSE_MODALITIES,\n",
+    "    image_config=ImageConfig(aspect_ratio=ASPECT_RATIO),\n",
+    ")\n",
     "\n",
     "\n",
     "def generate_content(sources: list[PIL_Image], prompt: str) -> PIL_Image | None:\n",
@@ -712,15 +727,12 @@
     "import urllib.request\n",
     "\n",
     "import PIL.Image\n",
-    "import PIL.ImageOps\n",
     "\n",
     "ARCHIVE_URL = \"https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/media-generation/consistent_imagery_generation/0_archive.png\"\n",
     "\n",
     "\n",
     "def load_archive() -> None:\n",
     "    image = get_image_from_url(ARCHIVE_URL)\n",
-    "    # Keep original details in 16:9 landscape aspect ratio (arbitrary)\n",
-    "    image = crop_expand_if_needed(image, 1344, 768)\n",
     "    assets.set_asset(Asset(AssetId.ARCHIVE, [], \"\", image))\n",
     "    display_image(image)\n",
     "\n",
@@ -730,21 +742,6 @@
     "        return PIL.Image.open(response)\n",
     "\n",
     "\n",
-    "def crop_expand_if_needed(image: PIL_Image, dst_w: int, dst_h: int) -> PIL_Image:\n",
-    "    src_w, src_h = image.size\n",
-    "    if dst_w < src_w or dst_h < src_h:\n",
-    "        crop_l, crop_t = (src_w - dst_w) // 2, (src_h - dst_h) // 2\n",
-    "        image = image.crop((crop_l, crop_t, crop_l + dst_w, crop_t + dst_h))\n",
-    "        src_w, src_h = image.size\n",
-    "    if src_w < dst_w or src_h < dst_h:\n",
-    "        off_l, off_t = (dst_w - src_w) // 2, (dst_h - src_h) // 2\n",
-    "        borders = (off_l, off_t, dst_w - src_w - off_l, dst_h - src_h - off_t)\n",
-    "        image = PIL.ImageOps.expand(image, borders, fill=\"white\")\n",
-    "\n",
-    "    assert image.size == (dst_w, dst_h)\n",
-    "    return image\n",
-    "\n",
-    "\n",
     "load_archive()"
    ]
   },
@@ -754,8 +751,6 @@
     "id": "0_752MsD2jhd"
    },
    "source": [
-    "> 💡 Gemini will preserve the closest aspect ratio of the last input image. Consequently, we cropped the archive image to `1344 × 768` pixels (close to `16:9`). This preserves the original details (no rescaling) and keeps the same landscape resolution in all our future scenes. Gemini can generate `1024 × 1024` images (`1:1`) but also their `16:9`, `9:16`, `4:3`, and `3:4` equivalents (in terms of tokens).\n",
-    "\n",
     "This archive image was generated in July 2024 with a beta version of Imagen 3, prompted with _\"On white background, a small hand-felted toy of blue robot. The felt is soft and cuddly…\"_. The result looked really good but, at the time, there was absolutely no determinism and no consistency. As a result, this was a nice one-shot image generation and the cute little robot seemed gone forever…\n"
    ]
   },
@@ -782,13 +777,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "id": "FZQ48d4i2jhd"
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "source_ids = [AssetId.ARCHIVE]\n",
-    "prompt = \"Extract the robot as is, without its shadow, replacing everything with a solid white fill.\"\n",
+    "prompt = \"Extract the robot in a clean cutout on a solid white fill.\"\n",
     "\n",
     "generate_image(source_ids, prompt)"
    ]
@@ -854,11 +847,12 @@
    "source": [
     "💡 A few remarks:\n",
     "\n",
-    "- The prompt describes the scene in terms of composition, as commonly used in media studios.\n",
-    "- If we try successive generations, they are consistent, with all robot features preserved.\n",
-    "- Our prompt does detail some aspects of the backpack, but we'll get slightly different backpacks for everything that's unspecified.\n",
-    "- For the sake of simplicity, we added the backpack directly in the character sheet but, in a real production pipeline, we would probably make it part of a separate accessory sheet.\n",
-    "- To control exactly the backpack shape and design, we could also use a reference photo and \"transform the backpack into a stylized felt version\".\n",
+    "- Our prompt focuses on the composition of the scene, a common practice in media studios.\n",
+    "- Successive generated images will be consistent, preserving all robot features visible in the provided image. However, since we only specified some features of the backpack (e.g., a single buckle) and left others unspecified, we'll get slightly different backpacks.\n",
+    "- For simplicity, we directly included the backpack in the character sheet. In a real production pipeline, we would likely make it part of a separate accessory sheet.\n",
+    "- To control the backpack's exact shape and design, we could also use a reference photo of a real backpack and instruct Gemini to \"transform the backpack into a stylized felt version.\"\n",
+    "- Gemini can generate `1024 × 1024` images (`1:1` aspect ratio) or equivalent resolutions (token-wise) for the other supported aspect ratios (`2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, and `21:9`).\n",
+    "- In the request configuration, we specified `aspect_ratio=\"16:9\"`, which generates images at `1344 × 768` pixels. If this parameter is omitted, Gemini uses the aspect ratio of the input image (the last one if multiple are provided) to select the closest supported aspect ratio.\n",
     "\n",
     "This new asset can now serve as a design reference in our future image generations.\n"
    ]
@@ -1829,7 +1823,8 @@
      "file_id": "1pmb_xmjaw8F4reXMLba3RO3QzxkGpuJd",
      "timestamp": 1736858287414
     }
-   ]
+   ],
+   "toc_visible": true
   },
   "kernelspec": {
    "display_name": "Python 3",