|
110 | 110 | "## 🔥 Challenge\n" |
111 | 111 | ] |
112 | 112 | }, |
| 113 | + { |
| 114 | + "cell_type": "markdown", |
| 115 | + "metadata": {}, |
| 116 | + "source": [ |
| 117 | + "\n" |
| 118 | + ] |
| 119 | + }, |
113 | 120 | { |
114 | 121 | "cell_type": "markdown", |
115 | 122 | "metadata": { |
116 | 123 | "id": "35E-CpC6qoNw" |
117 | 124 | }, |
118 | 125 | "source": [ |
119 | | - "\n", |
120 | | - "\n", |
121 | 126 | "We all have existing images worth reusing in different contexts. This would generally imply modifying the images, a complex (if not impossible) task requiring very specific skills and tools. This explains why our archives are full of forgotten or unused treasures. State-of-the-art vision models have evolved so much that we can reconsider this problem.\n", |
122 | 127 | "\n", |
123 | 128 | "So, can we breathe new life into our visual archives?\n", |
|
176 | 181 | }, |
177 | 182 | "outputs": [], |
178 | 183 | "source": [ |
179 | | - "%pip install --quiet \"google-genai>=1.38.0\" \"networkx[default]\"" |
| 184 | + "%pip install --quiet \"google-genai>=1.40.0\" \"networkx[default]\"" |
180 | 185 | ] |
181 | 186 | }, |
182 | 187 | { |
|
496 | 501 | "id": "9Ls-wVEq2jhc" |
497 | 502 | }, |
498 | 503 | "source": [ |
499 | | - "For this challenge, we'll select the latest Gemini 2.5 Flash Image model (currently in preview):\n", |
| 504 | + "For this challenge, we'll select the latest Gemini 2.5 Flash Image model:\n", |
500 | 505 | "\n", |
501 | | - "`GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image-preview\"`\n", |
| 506 | + "`GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image\"`\n", |
502 | 507 | "\n", |
503 | 508 | "> 💡 \"Gemini 2.5 Flash Image\" is also known as \"Nano Banana\" 🍌\n" |
504 | 509 | ] |
|
534 | 539 | "import IPython.display\n", |
535 | 540 | "import tenacity\n", |
536 | 541 | "from google.genai.errors import ClientError\n", |
537 | | - "from google.genai.types import GenerateContentConfig, PIL_Image\n", |
| 542 | + "from google.genai.types import GenerateContentConfig, ImageConfig, PIL_Image\n", |
| 543 | + "\n", |
| 544 | + "GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image\"\n", |
| 545 | + "\n", |
| 546 | + "# You can add the \"TEXT\" modality for potential textual feedback (or in iterative chat mode)\n", |
| 547 | + "RESPONSE_MODALITIES = [\"IMAGE\"]\n", |
| 548 | + "\n", |
| 549 | + "# Supported aspect ratios: \"1:1\", \"2:3\", \"3:2\", \"3:4\", \"4:3\", \"4:5\", \"5:4\", \"9:16\", \"16:9\", and \"21:9\"\n", |
| 550 | + "ASPECT_RATIO = \"16:9\"\n", |
538 | 551 | "\n", |
539 | | - "GEMINI_2_5_FLASH_IMAGE = \"gemini-2.5-flash-image-preview\"\n", |
540 | | - "GENERATION_CONFIG = GenerateContentConfig(response_modalities=[\"TEXT\", \"IMAGE\"])\n", |
| 552 | + "GENERATION_CONFIG = GenerateContentConfig(\n", |
| 553 | + " response_modalities=RESPONSE_MODALITIES,\n", |
| 554 | + " image_config=ImageConfig(aspect_ratio=ASPECT_RATIO),\n", |
| 555 | + ")\n", |
541 | 556 | "\n", |
542 | 557 | "\n", |
543 | 558 | "def generate_content(sources: list[PIL_Image], prompt: str) -> PIL_Image | None:\n", |
|
712 | 727 | "import urllib.request\n", |
713 | 728 | "\n", |
714 | 729 | "import PIL.Image\n", |
715 | | - "import PIL.ImageOps\n", |
716 | 730 | "\n", |
717 | 731 | "ARCHIVE_URL = \"https://storage.googleapis.com/github-repo/generative-ai/gemini/use-cases/media-generation/consistent_imagery_generation/0_archive.png\"\n", |
718 | 732 | "\n", |
719 | 733 | "\n", |
720 | 734 | "def load_archive() -> None:\n", |
721 | 735 | " image = get_image_from_url(ARCHIVE_URL)\n", |
722 | | - " # Keep original details in 16:9 landscape aspect ratio (arbitrary)\n", |
723 | | - " image = crop_expand_if_needed(image, 1344, 768)\n", |
724 | 736 | " assets.set_asset(Asset(AssetId.ARCHIVE, [], \"\", image))\n", |
725 | 737 | " display_image(image)\n", |
726 | 738 | "\n", |
|
730 | 742 | " return PIL.Image.open(response)\n", |
731 | 743 | "\n", |
732 | 744 | "\n", |
733 | | - "def crop_expand_if_needed(image: PIL_Image, dst_w: int, dst_h: int) -> PIL_Image:\n", |
734 | | - " src_w, src_h = image.size\n", |
735 | | - " if dst_w < src_w or dst_h < src_h:\n", |
736 | | - " crop_l, crop_t = (src_w - dst_w) // 2, (src_h - dst_h) // 2\n", |
737 | | - " image = image.crop((crop_l, crop_t, crop_l + dst_w, crop_t + dst_h))\n", |
738 | | - " src_w, src_h = image.size\n", |
739 | | - " if src_w < dst_w or src_h < dst_h:\n", |
740 | | - " off_l, off_t = (dst_w - src_w) // 2, (dst_h - src_h) // 2\n", |
741 | | - " borders = (off_l, off_t, dst_w - src_w - off_l, dst_h - src_h - off_t)\n", |
742 | | - " image = PIL.ImageOps.expand(image, borders, fill=\"white\")\n", |
743 | | - "\n", |
744 | | - " assert image.size == (dst_w, dst_h)\n", |
745 | | - " return image\n", |
746 | | - "\n", |
747 | | - "\n", |
748 | 745 | "load_archive()" |
749 | 746 | ] |
750 | 747 | }, |
|
754 | 751 | "id": "0_752MsD2jhd" |
755 | 752 | }, |
756 | 753 | "source": [ |
757 | | - "> 💡 Gemini will preserve the closest aspect ratio of the last input image. Consequently, we cropped the archive image to `1344 × 768` pixels (close to `16:9`). This preserves the original details (no rescaling) and keeps the same landscape resolution in all our future scenes. Gemini can generate `1024 × 1024` images (`1:1`) but also their `16:9`, `9:16`, `4:3`, and `3:4` equivalents (in terms of tokens).\n", |
758 | | - "\n", |
759 | 754 | "This archive image was generated in July 2024 with a beta version of Imagen 3, prompted with _\"On white background, a small hand-felted toy of blue robot. The felt is soft and cuddly…\"_. The result looked really good but, at the time, there was absolutely no determinism and no consistency. As a result, this was a nice one-shot image generation and the cute little robot seemed gone forever…\n" |
760 | 755 | ] |
761 | 756 | }, |
|
782 | 777 | { |
783 | 778 | "cell_type": "code", |
784 | 779 | "execution_count": null, |
785 | | - "metadata": { |
786 | | - "id": "FZQ48d4i2jhd" |
787 | | - }, |
| 780 | + "metadata": {}, |
788 | 781 | "outputs": [], |
789 | 782 | "source": [ |
790 | 783 | "source_ids = [AssetId.ARCHIVE]\n", |
791 | | - "prompt = \"Extract the robot as is, without its shadow, replacing everything with a solid white fill.\"\n", |
| 784 | + "prompt = \"Extract the robot in a clean cutout on a solid white fill.\"\n", |
792 | 785 | "\n", |
793 | 786 | "generate_image(source_ids, prompt)" |
794 | 787 | ] |
|
854 | 847 | "source": [ |
855 | 848 | "💡 A few remarks:\n", |
856 | 849 | "\n", |
857 | | - "- The prompt describes the scene in terms of composition, as commonly used in media studios.\n", |
858 | | - "- If we try successive generations, they are consistent, with all robot features preserved.\n", |
859 | | - "- Our prompt does detail some aspects of the backpack, but we'll get slightly different backpacks for everything that's unspecified.\n", |
860 | | - "- For the sake of simplicity, we added the backpack directly in the character sheet but, in a real production pipeline, we would probably make it part of a separate accessory sheet.\n", |
861 | | - "- To control exactly the backpack shape and design, we could also use a reference photo and \"transform the backpack into a stylized felt version\".\n", |
| 850 | + "- Our prompt focuses on the composition of the scene, a common practice in media studios.\n", |
| 851 | + "- Successive generated images will be consistent, preserving all robot features visible in the provided image. However, since we only specified some features of the backpack (e.g., a single buckle) and left others unspecified, we'll get slightly different backpacks.\n", |
| 852 | + "- For simplicity, we directly included the backpack in the character sheet. In a real production pipeline, we would likely make it part of a separate accessory sheet.\n", |
| 853 | + "- To control the backpack's exact shape and design, we could also use a reference photo of a real backpack and instruct Gemini to \"transform the backpack into a stylized felt version.\"\n", |
| 854 | + "- Gemini can generate `1024 × 1024` images (`1:1` aspect ratio) or equivalent resolutions (token-wise) for the other supported aspect ratios (`2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, and `21:9`).\n", |
| 855 | + "- In the request configuration, we specified `aspect_ratio=\"16:9\"`, which generates images at `1344 × 768` pixels. If this parameter is omitted, Gemini uses the aspect ratio of the input image (the last one if multiple are provided) to select the closest supported aspect ratio.\n", |
862 | 856 | "\n", |
863 | 857 | "This new asset can now serve as a design reference in our future image generations.\n" |
864 | 858 | ] |
|
1829 | 1823 | "file_id": "1pmb_xmjaw8F4reXMLba3RO3QzxkGpuJd", |
1830 | 1824 | "timestamp": 1736858287414 |
1831 | 1825 | } |
1832 | | - ] |
| 1826 | + ], |
| 1827 | + "toc_visible": true |
1833 | 1828 | }, |
1834 | 1829 | "kernelspec": { |
1835 | 1830 | "display_name": "Python 3", |
|
0 commit comments