feat(olostep): add Olostep service bubble for web scraping and AI con… #223

ZeeshanAdilButt · 2025-12-16T08:11:05Z

…tent extraction

Add OlostepBubble with 5 operations: scrape, batch, crawl, map, answer
Add OLOSTEP_API_KEY credential type and configuration
Add 'olostep' to BubbleName type
Register bubble in BubbleFactory
Export from bubble-core index
Add comprehensive unit tests
Add credential UI configuration in bubble-studio

Summary

Adds Olostep integration to BubbleLab - web scraping and AI-powered content extraction with 5 operations: scrape, batch, crawl, map, and answer.

Type of Change

New feature

Checklist

My code follows the code style of this project
I have added appropriate tests for my changes
I have run pnpm check and all tests pass (pre-existing lint errors in bubblelab-api unrelated to this PR)
I have tested my changes locally

Screenshots (if applicable)

Additional Context

bubblelab-pearl · 2025-12-16T08:11:29Z

Suggested PR title from Pearl

Title: feat: add olostep service bubble for web scraping

Body:

Summary

Adds a new service bubble integration for Olostep API, enabling web scraping and AI-powered content extraction capabilities.

Features

Implements 5 core operations:

Scrape: Extract content from a single URL in multiple formats (markdown, HTML, JSON, text)
Batch: Scrape up to 1000 URLs in a single request
Crawl: Crawl websites and extract content from multiple pages
Map: Discover all URLs on a website for sitemap generation
Answer: AI-powered question answering using web content as context

Changes

Add OlostepBubble service class with comprehensive operation support
Add OLOSTEP_API_KEY credential type and management
Register Olostep bubble in BubbleFactory
Add credential configuration UI in CredentialsPage
Add comprehensive test suite with 354 lines of tests
Update shared schemas and type definitions
Export Olostep bubble and types from bubble-core

Use Cases

Content extraction and data collection
Website monitoring and change detection
Research and competitive analysis
Lead generation and data enrichment
Building AI agents with web access
Automated content summarization

Technical Details

API integration with Olostep v1 API
Support for structured parsers (LinkedIn, Twitter, GitHub, etc.)
Zod schema validation using discriminated unions
Geo-targeting support with country codes
Configurable wait times and output formats

Copilot

Pull request overview

This PR adds comprehensive integration for Olostep, a web scraping and AI-powered content extraction service, to the BubbleLab platform. The implementation follows established patterns in the codebase and includes proper credential management, schema definitions, factory registration, and test coverage.

Key Changes:

Adds OlostepBubble service with 5 operations: scrape, batch, crawl, map, and answer
Introduces OLOSTEP_API_KEY credential type with full configuration across packages
Registers the bubble in the factory and exports it from the core package

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/bubble-shared-schemas/src/types.ts	Adds OLOSTEP_API_KEY credential type and 'olostep' to BubbleName union type
packages/bubble-shared-schemas/src/credential-schema.ts	Registers OLOSTEP_API_KEY in credential environment mapping and bubble credential options
packages/bubble-shared-schemas/src/bubble-definition-schema.ts	Adds empty credential configuration map entry for OLOSTEP_API_KEY
packages/bubble-core/src/bubbles/service-bubble/olostep.ts	Implements OlostepBubble class with discriminated union schemas for 5 operations, API client methods, and error handling
packages/bubble-core/src/bubbles/service-bubble/olostep.test.ts	Provides comprehensive unit tests covering registration, schemas, metadata, credentials, and all 5 operations
packages/bubble-core/src/bubble-factory.ts	Registers OlostepBubble in factory and includes it in code generator bubble list
packages/bubble-core/src/index.ts	Exports OlostepBubble class and OlostepParamsInput type from package
apps/bubble-studio/src/pages/CredentialsPage.tsx	Adds UI configuration for OLOSTEP_API_KEY credential with label, description, and placeholder

Copilot · 2025-12-16T08:25:59Z

packages/bubble-core/src/bubbles/service-bubble/olostep.test.ts

+describe('OlostepBubble', () => {
+  //
+  // REGISTRATION & SCHEMA
+  //
+  describe('Registration & Schema', () => {
+    it('should be registered in BubbleRegistry', async () => {
+      const bubbleClass = factory.get('olostep');
+      expect(bubbleClass).toBeDefined();
+      expect(bubbleClass).toBe(OlostepBubble);
+    });
+
+    it('schema should be a Zod discriminated union based on "operation"', () => {
+      const schema = OlostepBubble.schema;
+      expect(schema).toBeDefined();
+
+      // Validate ZodDiscriminatedUnion
+      expect(schema instanceof ZodDiscriminatedUnion).toBe(true);
+
+      const du = schema as ZodDiscriminatedUnion<
+        'operation',
+        readonly ZodObject<any>[]
+      >;
+      expect(du.discriminator).toBe('operation');
+
+      const operationValues = du.options.map((o) => o.shape.operation.value);
+
+      expect(operationValues).toContain('scrape');
+      expect(operationValues).toContain('batch');
+      expect(operationValues).toContain('crawl');
+      expect(operationValues).toContain('map');
+      expect(operationValues).toContain('answer');
+    });
+
+    it('result schema should validate a sample scrape result', () => {
+      const sample = {
+        operation: 'scrape',
+        success: true,
+        error: '',
+        markdown_content: '# Hello World',
+      };
+
+      const parsed = OlostepBubble.resultSchema.safeParse(sample);
+      expect(parsed.success).toBe(true);
+    });
+  });
+
+  //
+  // METADATA
+  //
+  describe('Metadata Tests', () => {
+    it('should have correct metadata', () => {
+      const metadata = factory.getMetadata('olostep');
+
+      expect(metadata).toBeDefined();
+      expect(metadata?.name).toBe('olostep');
+      expect(metadata?.alias).toBe('web-scraper');
+      expect(metadata?.schema).toBeDefined();
+      expect(metadata?.resultSchema).toBeDefined();
+      expect(metadata?.shortDescription).toContain('Web scraping');
+      expect(metadata?.longDescription).toContain('Olostep');
+    });
+
+    it('static properties are correct', () => {
+      expect(OlostepBubble.bubbleName).toBe('olostep');
+      expect(OlostepBubble.alias).toBe('web-scraper');
+      expect(OlostepBubble.service).toBe('olostep');
+      expect(OlostepBubble.authType).toBe('apikey');
+      expect(OlostepBubble.type).toBe('service');
+      expect(OlostepBubble.schema).toBeDefined();
+      expect(OlostepBubble.resultSchema).toBeDefined();
+      expect(OlostepBubble.shortDescription).toContain('Web scraping');
+      expect(OlostepBubble.longDescription).toContain('Scrape');
+    });
+  });
+
+  //
+  // CREDENTIAL VALIDATION
+  //
+  describe('Credential Validation', () => {
+    it('should fail testCredential() with missing credentials', async () => {
+      const bubble = new OlostepBubble({
+        operation: 'scrape',
+        url: 'https://example.com',
+      });
+
+      const result = await bubble.testCredential();
+      expect(result).toBe(false);
+    });
+
+    it('should pass testCredential() with valid credentials', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () => JSON.stringify({ markdown_content: '# Test' }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'scrape',
+        url: 'https://example.com',
+        credentials: createTestCredentials(),
+      });
+
+      const result = await bubble.testCredential();
+      expect(result).toBe(true);
+    });
+  });
+
+  //
+  // SCRAPE OPERATION
+  //
+  describe('Scrape Operation', () => {
+    it('should create bubble with scrape operation', () => {
+      const params: OlostepParamsInput = {
+        operation: 'scrape',
+        url: 'https://example.com',
+        formats: ['markdown'],
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.operation).toBe('scrape');
+      expect((bubble as any).params.url).toBe('https://example.com');
+    });
+
+    it('should accept all scrape optional parameters', () => {
+      const params: OlostepParamsInput = {
+        operation: 'scrape',
+        url: 'https://example.com',
+        formats: ['markdown', 'html'],
+        country: 'US',
+        wait_before_scraping: 2000,
+        parser: '@olostep/product-page',
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.formats).toEqual(['markdown', 'html']);
+      expect((bubble as any).params.country).toBe('US');
+      expect((bubble as any).params.wait_before_scraping).toBe(2000);
+      expect((bubble as any).params.parser).toBe('@olostep/product-page');
+    });
+
+    it('should return success for scrape', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () =>
+          JSON.stringify({
+            markdown_content: '# Test Page',
+            metadata: { title: 'Test', url: 'https://example.com' },
+          }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'scrape',
+        url: 'https://example.com',
+        credentials: createTestCredentials(),
+      });
+
+      const res = await bubble.action();
+
+      expect(res.data.operation).toBe('scrape');
+      expect(res.success).toBe(true);
+      expect(res.error).toBe('');
+      expect(res.data.markdown_content).toBeDefined();
+    });
+  });
+
+  //
+  // BATCH OPERATION
+  //
+  describe('Batch Operation', () => {
+    it('should create bubble with batch operation', () => {
+      const params: OlostepParamsInput = {
+        operation: 'batch',
+        urls: ['https://example.com/1', 'https://example.com/2'],
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.operation).toBe('batch');
+      expect((bubble as any).params.urls).toHaveLength(2);
+    });
+
+    it('should return success for batch', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () =>
+          JSON.stringify({
+            batch_id: 'batch_123',
+            status: 'processing',
+          }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'batch',
+        urls: ['https://example.com/1', 'https://example.com/2'],
+        credentials: createTestCredentials(),
+      });
+
+      const res = await bubble.action();
+
+      expect(res.data.operation).toBe('batch');
+      expect(res.success).toBe(true);
+      expect(res.data.batch_id).toBeDefined();
+    });
+  });
+
+  //
+  // CRAWL OPERATION
+  //
+  describe('Crawl Operation', () => {
+    it('should create bubble with crawl operation', () => {
+      const params: OlostepParamsInput = {
+        operation: 'crawl',
+        start_url: 'https://example.com',
+        max_pages: 50,
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.operation).toBe('crawl');
+      expect((bubble as any).params.start_url).toBe('https://example.com');
+      expect((bubble as any).params.max_pages).toBe(50);
+    });
+
+    it('should return success for crawl', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () =>
+          JSON.stringify({
+            crawl_id: 'crawl_123',
+            status: 'completed',
+            pages_crawled: 10,
+          }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'crawl',
+        start_url: 'https://example.com',
+        credentials: createTestCredentials(),
+      });
+
+      const res = await bubble.action();
+
+      expect(res.data.operation).toBe('crawl');
+      expect(res.success).toBe(true);
+      expect(res.data.crawl_id).toBeDefined();
+    });
+  });
+
+  //
+  // MAP OPERATION
+  //
+  describe('Map Operation', () => {
+    it('should create bubble with map operation', () => {
+      const params: OlostepParamsInput = {
+        operation: 'map',
+        url: 'https://example.com',
+        top_n: 200,
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.operation).toBe('map');
+      expect((bubble as any).params.url).toBe('https://example.com');
+      expect((bubble as any).params.top_n).toBe(200);
+    });
+
+    it('should return success for map', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () =>
+          JSON.stringify({
+            urls: ['https://example.com/page1', 'https://example.com/page2'],
+            total_urls: 2,
+          }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'map',
+        url: 'https://example.com',
+        credentials: createTestCredentials(),
+      });
+
+      const res = await bubble.action();
+
+      expect(res.data.operation).toBe('map');
+      expect(res.success).toBe(true);
+      expect(res.data.urls).toBeDefined();
+      expect(res.data.urls!.length).toBeGreaterThan(0);
+    });
+  });
+
+  //
+  // ANSWER OPERATION
+  //
+  describe('Answer Operation', () => {
+    it('should create bubble with answer operation', () => {
+      const params: OlostepParamsInput = {
+        operation: 'answer',
+        task: 'What is the main topic of this website?',
+        context_urls: ['https://example.com'],
+      };
+
+      const bubble = new OlostepBubble(params);
+      expect((bubble as any).params.operation).toBe('answer');
+      expect((bubble as any).params.task).toContain('main topic');
+    });
+
+    it('should return success for answer', async () => {
+      mockFetch.mockResolvedValueOnce({
+        ok: true,
+        text: async () =>
+          JSON.stringify({
+            answer: 'This website is about example content.',
+            citations: [{ url: 'https://example.com', title: 'Example' }],
+            sources_used: 1,
+          }),
+      });
+
+      const bubble = new OlostepBubble({
+        operation: 'answer',
+        task: 'What is this website about?',
+        credentials: createTestCredentials(),
+      });
+
+      const res = await bubble.action();
+
+      expect(res.data.operation).toBe('answer');
+      expect(res.success).toBe(true);
+      expect(res.data.answer).toBeDefined();
+    });
+  });
+});


The test suite is missing error handling tests. Consider adding tests for:

API request failures (non-2xx responses)

Network errors (fetch throws exception)

Invalid API key responses

Malformed response data

Other service bubbles like ElevenLabsBubble include these types of tests to ensure robust error handling.

Copilot · 2025-12-16T08:25:59Z

packages/bubble-core/src/bubbles/service-bubble/olostep.ts

+    const apiKey = this.chooseCredential();
+    if (!apiKey) return false;
+
+    try {
+      // Simple health check with minimal scrape
+      const response = await fetch(`${OLOSTEP_API_URL}/scrapes`, {
+        method: 'POST',
+        headers: {
+          Authorization: `Bearer ${apiKey}`,
+          'Content-Type': 'application/json',
+        },
+        body: JSON.stringify({
+          url_to_scrape: 'https://example.com',
+          formats: ['text'],
+        }),
+      });
+      return response.ok;
+    } catch {
+      return false;
+    }
+  }
+


The testCredential method makes a full scrape request to example.com, which could be slow and might consume API credits. Consider using a lighter-weight endpoint for credential validation, such as a health check or user info endpoint if available from the Olostep API. If no such endpoint exists, this approach is acceptable but may result in slower credential validation and unnecessary API usage.

Copilot · 2025-12-16T08:25:59Z

packages/bubble-core/src/bubbles/service-bubble/olostep.ts

+        return { operation: 'scrape', ...base };
+    }
+  }


The default case returns an operation type of 'scrape' which could be misleading if an unknown operation somehow reaches this point. While TypeScript's type system should prevent this, consider either throwing an error or using a type assertion to indicate this is unreachable. For example: throw new Error('Unreachable: invalid operation type') or using operation as 'scrape' with a comment explaining this is a fallback that should never occur.

…tent extraction - Add OlostepBubble with 5 operations: scrape, batch, crawl, map, answer - Add OLOSTEP_API_KEY credential type and configuration - Add 'olostep' to BubbleName type - Register bubble in BubbleFactory - Export from bubble-core index - Add comprehensive unit tests - Add credential UI configuration in bubble-studio

Copilot AI review requested due to automatic review settings December 16, 2025 08:11

Copilot started reviewing on behalf of ZeeshanAdilButt December 16, 2025 08:11 View session

ZeeshanAdilButt force-pushed the feat/olostep-integration branch from e69c95c to a67e345 Compare December 16, 2025 08:14

Copilot AI reviewed Dec 16, 2025

View reviewed changes

iqbalbhatti49 approved these changes Dec 16, 2025

View reviewed changes

ZeeshanAdilButt force-pushed the feat/olostep-integration branch from a67e345 to a29b5d3 Compare December 17, 2025 06:38

Add error handling tests and improve exhaustive type checking

39dceca

ZeeshanAdilButt requested a review from iqbalbhatti49 December 22, 2025 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(olostep): add Olostep service bubble for web scraping and AI con… #223

feat(olostep): add Olostep service bubble for web scraping and AI con… #223

Uh oh!

ZeeshanAdilButt commented Dec 16, 2025

Uh oh!

bubblelab-pearl commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(olostep): add Olostep service bubble for web scraping and AI con… #223

Are you sure you want to change the base?

feat(olostep): add Olostep service bubble for web scraping and AI con… #223

Uh oh!

Conversation

ZeeshanAdilButt commented Dec 16, 2025

Summary

Type of Change

Checklist

Screenshots (if applicable)

Additional Context

Uh oh!

bubblelab-pearl commented Dec 16, 2025

Suggested PR title from Pearl

Summary

Features

Changes

Use Cases

Technical Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants