Skip to content

Conversation

@ZeeshanAdilButt
Copy link

…tent extraction

  • Add OlostepBubble with 5 operations: scrape, batch, crawl, map, answer
  • Add OLOSTEP_API_KEY credential type and configuration
  • Add 'olostep' to BubbleName type
  • Register bubble in BubbleFactory
  • Export from bubble-core index
  • Add comprehensive unit tests
  • Add credential UI configuration in bubble-studio

Summary

Adds Olostep integration to BubbleLab - web scraping and AI-powered content extraction with 5 operations: scrape, batch, crawl, map, and answer.

Type of Change

  • New feature

Checklist

  • My code follows the code style of this project
  • I have added appropriate tests for my changes
  • I have run pnpm check and all tests pass (pre-existing lint errors in bubblelab-api unrelated to this PR)
  • I have tested my changes locally

Screenshots (if applicable)

Additional Context

Copilot AI review requested due to automatic review settings December 16, 2025 08:11
@bubblelab-pearl
Copy link
Contributor

Suggested PR title from Pearl

Title: feat: add olostep service bubble for web scraping

Body:

Summary

Adds a new service bubble integration for Olostep API, enabling web scraping and AI-powered content extraction capabilities.

Features

Implements 5 core operations:

  • Scrape: Extract content from a single URL in multiple formats (markdown, HTML, JSON, text)
  • Batch: Scrape up to 1000 URLs in a single request
  • Crawl: Crawl websites and extract content from multiple pages
  • Map: Discover all URLs on a website for sitemap generation
  • Answer: AI-powered question answering using web content as context

Changes

  • Add OlostepBubble service class with comprehensive operation support
  • Add OLOSTEP_API_KEY credential type and management
  • Register Olostep bubble in BubbleFactory
  • Add credential configuration UI in CredentialsPage
  • Add comprehensive test suite with 354 lines of tests
  • Update shared schemas and type definitions
  • Export Olostep bubble and types from bubble-core

Use Cases

  • Content extraction and data collection
  • Website monitoring and change detection
  • Research and competitive analysis
  • Lead generation and data enrichment
  • Building AI agents with web access
  • Automated content summarization

Technical Details

  • API integration with Olostep v1 API
  • Support for structured parsers (LinkedIn, Twitter, GitHub, etc.)
  • Zod schema validation using discriminated unions
  • Geo-targeting support with country codes
  • Configurable wait times and output formats

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive integration for Olostep, a web scraping and AI-powered content extraction service, to the BubbleLab platform. The implementation follows established patterns in the codebase and includes proper credential management, schema definitions, factory registration, and test coverage.

Key Changes:

  • Adds OlostepBubble service with 5 operations: scrape, batch, crawl, map, and answer
  • Introduces OLOSTEP_API_KEY credential type with full configuration across packages
  • Registers the bubble in the factory and exports it from the core package

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/bubble-shared-schemas/src/types.ts Adds OLOSTEP_API_KEY credential type and 'olostep' to BubbleName union type
packages/bubble-shared-schemas/src/credential-schema.ts Registers OLOSTEP_API_KEY in credential environment mapping and bubble credential options
packages/bubble-shared-schemas/src/bubble-definition-schema.ts Adds empty credential configuration map entry for OLOSTEP_API_KEY
packages/bubble-core/src/bubbles/service-bubble/olostep.ts Implements OlostepBubble class with discriminated union schemas for 5 operations, API client methods, and error handling
packages/bubble-core/src/bubbles/service-bubble/olostep.test.ts Provides comprehensive unit tests covering registration, schemas, metadata, credentials, and all 5 operations
packages/bubble-core/src/bubble-factory.ts Registers OlostepBubble in factory and includes it in code generator bubble list
packages/bubble-core/src/index.ts Exports OlostepBubble class and OlostepParamsInput type from package
apps/bubble-studio/src/pages/CredentialsPage.tsx Adds UI configuration for OLOSTEP_API_KEY credential with label, description, and placeholder

Comment on lines 27 to 354
describe('OlostepBubble', () => {
//
// REGISTRATION & SCHEMA
//
describe('Registration & Schema', () => {
it('should be registered in BubbleRegistry', async () => {
const bubbleClass = factory.get('olostep');
expect(bubbleClass).toBeDefined();
expect(bubbleClass).toBe(OlostepBubble);
});

it('schema should be a Zod discriminated union based on "operation"', () => {
const schema = OlostepBubble.schema;
expect(schema).toBeDefined();

// Validate ZodDiscriminatedUnion
expect(schema instanceof ZodDiscriminatedUnion).toBe(true);

const du = schema as ZodDiscriminatedUnion<
'operation',
readonly ZodObject<any>[]
>;
expect(du.discriminator).toBe('operation');

const operationValues = du.options.map((o) => o.shape.operation.value);

expect(operationValues).toContain('scrape');
expect(operationValues).toContain('batch');
expect(operationValues).toContain('crawl');
expect(operationValues).toContain('map');
expect(operationValues).toContain('answer');
});

it('result schema should validate a sample scrape result', () => {
const sample = {
operation: 'scrape',
success: true,
error: '',
markdown_content: '# Hello World',
};

const parsed = OlostepBubble.resultSchema.safeParse(sample);
expect(parsed.success).toBe(true);
});
});

//
// METADATA
//
describe('Metadata Tests', () => {
it('should have correct metadata', () => {
const metadata = factory.getMetadata('olostep');

expect(metadata).toBeDefined();
expect(metadata?.name).toBe('olostep');
expect(metadata?.alias).toBe('web-scraper');
expect(metadata?.schema).toBeDefined();
expect(metadata?.resultSchema).toBeDefined();
expect(metadata?.shortDescription).toContain('Web scraping');
expect(metadata?.longDescription).toContain('Olostep');
});

it('static properties are correct', () => {
expect(OlostepBubble.bubbleName).toBe('olostep');
expect(OlostepBubble.alias).toBe('web-scraper');
expect(OlostepBubble.service).toBe('olostep');
expect(OlostepBubble.authType).toBe('apikey');
expect(OlostepBubble.type).toBe('service');
expect(OlostepBubble.schema).toBeDefined();
expect(OlostepBubble.resultSchema).toBeDefined();
expect(OlostepBubble.shortDescription).toContain('Web scraping');
expect(OlostepBubble.longDescription).toContain('Scrape');
});
});

//
// CREDENTIAL VALIDATION
//
describe('Credential Validation', () => {
it('should fail testCredential() with missing credentials', async () => {
const bubble = new OlostepBubble({
operation: 'scrape',
url: 'https://example.com',
});

const result = await bubble.testCredential();
expect(result).toBe(false);
});

it('should pass testCredential() with valid credentials', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () => JSON.stringify({ markdown_content: '# Test' }),
});

const bubble = new OlostepBubble({
operation: 'scrape',
url: 'https://example.com',
credentials: createTestCredentials(),
});

const result = await bubble.testCredential();
expect(result).toBe(true);
});
});

//
// SCRAPE OPERATION
//
describe('Scrape Operation', () => {
it('should create bubble with scrape operation', () => {
const params: OlostepParamsInput = {
operation: 'scrape',
url: 'https://example.com',
formats: ['markdown'],
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.operation).toBe('scrape');
expect((bubble as any).params.url).toBe('https://example.com');
});

it('should accept all scrape optional parameters', () => {
const params: OlostepParamsInput = {
operation: 'scrape',
url: 'https://example.com',
formats: ['markdown', 'html'],
country: 'US',
wait_before_scraping: 2000,
parser: '@olostep/product-page',
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.formats).toEqual(['markdown', 'html']);
expect((bubble as any).params.country).toBe('US');
expect((bubble as any).params.wait_before_scraping).toBe(2000);
expect((bubble as any).params.parser).toBe('@olostep/product-page');
});

it('should return success for scrape', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () =>
JSON.stringify({
markdown_content: '# Test Page',
metadata: { title: 'Test', url: 'https://example.com' },
}),
});

const bubble = new OlostepBubble({
operation: 'scrape',
url: 'https://example.com',
credentials: createTestCredentials(),
});

const res = await bubble.action();

expect(res.data.operation).toBe('scrape');
expect(res.success).toBe(true);
expect(res.error).toBe('');
expect(res.data.markdown_content).toBeDefined();
});
});

//
// BATCH OPERATION
//
describe('Batch Operation', () => {
it('should create bubble with batch operation', () => {
const params: OlostepParamsInput = {
operation: 'batch',
urls: ['https://example.com/1', 'https://example.com/2'],
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.operation).toBe('batch');
expect((bubble as any).params.urls).toHaveLength(2);
});

it('should return success for batch', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () =>
JSON.stringify({
batch_id: 'batch_123',
status: 'processing',
}),
});

const bubble = new OlostepBubble({
operation: 'batch',
urls: ['https://example.com/1', 'https://example.com/2'],
credentials: createTestCredentials(),
});

const res = await bubble.action();

expect(res.data.operation).toBe('batch');
expect(res.success).toBe(true);
expect(res.data.batch_id).toBeDefined();
});
});

//
// CRAWL OPERATION
//
describe('Crawl Operation', () => {
it('should create bubble with crawl operation', () => {
const params: OlostepParamsInput = {
operation: 'crawl',
start_url: 'https://example.com',
max_pages: 50,
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.operation).toBe('crawl');
expect((bubble as any).params.start_url).toBe('https://example.com');
expect((bubble as any).params.max_pages).toBe(50);
});

it('should return success for crawl', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () =>
JSON.stringify({
crawl_id: 'crawl_123',
status: 'completed',
pages_crawled: 10,
}),
});

const bubble = new OlostepBubble({
operation: 'crawl',
start_url: 'https://example.com',
credentials: createTestCredentials(),
});

const res = await bubble.action();

expect(res.data.operation).toBe('crawl');
expect(res.success).toBe(true);
expect(res.data.crawl_id).toBeDefined();
});
});

//
// MAP OPERATION
//
describe('Map Operation', () => {
it('should create bubble with map operation', () => {
const params: OlostepParamsInput = {
operation: 'map',
url: 'https://example.com',
top_n: 200,
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.operation).toBe('map');
expect((bubble as any).params.url).toBe('https://example.com');
expect((bubble as any).params.top_n).toBe(200);
});

it('should return success for map', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () =>
JSON.stringify({
urls: ['https://example.com/page1', 'https://example.com/page2'],
total_urls: 2,
}),
});

const bubble = new OlostepBubble({
operation: 'map',
url: 'https://example.com',
credentials: createTestCredentials(),
});

const res = await bubble.action();

expect(res.data.operation).toBe('map');
expect(res.success).toBe(true);
expect(res.data.urls).toBeDefined();
expect(res.data.urls!.length).toBeGreaterThan(0);
});
});

//
// ANSWER OPERATION
//
describe('Answer Operation', () => {
it('should create bubble with answer operation', () => {
const params: OlostepParamsInput = {
operation: 'answer',
task: 'What is the main topic of this website?',
context_urls: ['https://example.com'],
};

const bubble = new OlostepBubble(params);
expect((bubble as any).params.operation).toBe('answer');
expect((bubble as any).params.task).toContain('main topic');
});

it('should return success for answer', async () => {
mockFetch.mockResolvedValueOnce({
ok: true,
text: async () =>
JSON.stringify({
answer: 'This website is about example content.',
citations: [{ url: 'https://example.com', title: 'Example' }],
sources_used: 1,
}),
});

const bubble = new OlostepBubble({
operation: 'answer',
task: 'What is this website about?',
credentials: createTestCredentials(),
});

const res = await bubble.action();

expect(res.data.operation).toBe('answer');
expect(res.success).toBe(true);
expect(res.data.answer).toBeDefined();
});
});
});
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite is missing error handling tests. Consider adding tests for:

  • API request failures (non-2xx responses)
  • Network errors (fetch throws exception)
  • Invalid API key responses
  • Malformed response data

Other service bubbles like ElevenLabsBubble include these types of tests to ensure robust error handling.

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +341
const apiKey = this.chooseCredential();
if (!apiKey) return false;

try {
// Simple health check with minimal scrape
const response = await fetch(`${OLOSTEP_API_URL}/scrapes`, {
method: 'POST',
headers: {
Authorization: `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url_to_scrape: 'https://example.com',
formats: ['text'],
}),
});
return response.ok;
} catch {
return false;
}
}

Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testCredential method makes a full scrape request to example.com, which could be slow and might consume API credits. Consider using a lighter-weight endpoint for credential validation, such as a health check or user info endpoint if available from the Olostep API. If no such endpoint exists, this approach is acceptable but may result in slower credential validation and unnecessary API usage.

Copilot uses AI. Check for mistakes.
Comment on lines 575 to 577
return { operation: 'scrape', ...base };
}
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default case returns an operation type of 'scrape' which could be misleading if an unknown operation somehow reaches this point. While TypeScript's type system should prevent this, consider either throwing an error or using a type assertion to indicate this is unreachable. For example: throw new Error('Unreachable: invalid operation type') or using operation as 'scrape' with a comment explaining this is a fallback that should never occur.

Copilot uses AI. Check for mistakes.
…tent extraction

- Add OlostepBubble with 5 operations: scrape, batch, crawl, map, answer
- Add OLOSTEP_API_KEY credential type and configuration
- Add 'olostep' to BubbleName type
- Register bubble in BubbleFactory
- Export from bubble-core index
- Add comprehensive unit tests
- Add credential UI configuration in bubble-studio
@ZeeshanAdilButt ZeeshanAdilButt force-pushed the feat/olostep-integration branch from a67e345 to a29b5d3 Compare December 17, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants