-
Notifications
You must be signed in to change notification settings - Fork 165
feat(olostep): add Olostep service bubble for web scraping and AI con… #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(olostep): add Olostep service bubble for web scraping and AI con… #223
Conversation
Suggested PR title from PearlTitle: Body: SummaryAdds a new service bubble integration for Olostep API, enabling web scraping and AI-powered content extraction capabilities. FeaturesImplements 5 core operations:
Changes
Use Cases
Technical Details
|
e69c95c to
a67e345
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive integration for Olostep, a web scraping and AI-powered content extraction service, to the BubbleLab platform. The implementation follows established patterns in the codebase and includes proper credential management, schema definitions, factory registration, and test coverage.
Key Changes:
- Adds OlostepBubble service with 5 operations: scrape, batch, crawl, map, and answer
- Introduces OLOSTEP_API_KEY credential type with full configuration across packages
- Registers the bubble in the factory and exports it from the core package
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/bubble-shared-schemas/src/types.ts | Adds OLOSTEP_API_KEY credential type and 'olostep' to BubbleName union type |
| packages/bubble-shared-schemas/src/credential-schema.ts | Registers OLOSTEP_API_KEY in credential environment mapping and bubble credential options |
| packages/bubble-shared-schemas/src/bubble-definition-schema.ts | Adds empty credential configuration map entry for OLOSTEP_API_KEY |
| packages/bubble-core/src/bubbles/service-bubble/olostep.ts | Implements OlostepBubble class with discriminated union schemas for 5 operations, API client methods, and error handling |
| packages/bubble-core/src/bubbles/service-bubble/olostep.test.ts | Provides comprehensive unit tests covering registration, schemas, metadata, credentials, and all 5 operations |
| packages/bubble-core/src/bubble-factory.ts | Registers OlostepBubble in factory and includes it in code generator bubble list |
| packages/bubble-core/src/index.ts | Exports OlostepBubble class and OlostepParamsInput type from package |
| apps/bubble-studio/src/pages/CredentialsPage.tsx | Adds UI configuration for OLOSTEP_API_KEY credential with label, description, and placeholder |
| describe('OlostepBubble', () => { | ||
| // | ||
| // REGISTRATION & SCHEMA | ||
| // | ||
| describe('Registration & Schema', () => { | ||
| it('should be registered in BubbleRegistry', async () => { | ||
| const bubbleClass = factory.get('olostep'); | ||
| expect(bubbleClass).toBeDefined(); | ||
| expect(bubbleClass).toBe(OlostepBubble); | ||
| }); | ||
|
|
||
| it('schema should be a Zod discriminated union based on "operation"', () => { | ||
| const schema = OlostepBubble.schema; | ||
| expect(schema).toBeDefined(); | ||
|
|
||
| // Validate ZodDiscriminatedUnion | ||
| expect(schema instanceof ZodDiscriminatedUnion).toBe(true); | ||
|
|
||
| const du = schema as ZodDiscriminatedUnion< | ||
| 'operation', | ||
| readonly ZodObject<any>[] | ||
| >; | ||
| expect(du.discriminator).toBe('operation'); | ||
|
|
||
| const operationValues = du.options.map((o) => o.shape.operation.value); | ||
|
|
||
| expect(operationValues).toContain('scrape'); | ||
| expect(operationValues).toContain('batch'); | ||
| expect(operationValues).toContain('crawl'); | ||
| expect(operationValues).toContain('map'); | ||
| expect(operationValues).toContain('answer'); | ||
| }); | ||
|
|
||
| it('result schema should validate a sample scrape result', () => { | ||
| const sample = { | ||
| operation: 'scrape', | ||
| success: true, | ||
| error: '', | ||
| markdown_content: '# Hello World', | ||
| }; | ||
|
|
||
| const parsed = OlostepBubble.resultSchema.safeParse(sample); | ||
| expect(parsed.success).toBe(true); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // METADATA | ||
| // | ||
| describe('Metadata Tests', () => { | ||
| it('should have correct metadata', () => { | ||
| const metadata = factory.getMetadata('olostep'); | ||
|
|
||
| expect(metadata).toBeDefined(); | ||
| expect(metadata?.name).toBe('olostep'); | ||
| expect(metadata?.alias).toBe('web-scraper'); | ||
| expect(metadata?.schema).toBeDefined(); | ||
| expect(metadata?.resultSchema).toBeDefined(); | ||
| expect(metadata?.shortDescription).toContain('Web scraping'); | ||
| expect(metadata?.longDescription).toContain('Olostep'); | ||
| }); | ||
|
|
||
| it('static properties are correct', () => { | ||
| expect(OlostepBubble.bubbleName).toBe('olostep'); | ||
| expect(OlostepBubble.alias).toBe('web-scraper'); | ||
| expect(OlostepBubble.service).toBe('olostep'); | ||
| expect(OlostepBubble.authType).toBe('apikey'); | ||
| expect(OlostepBubble.type).toBe('service'); | ||
| expect(OlostepBubble.schema).toBeDefined(); | ||
| expect(OlostepBubble.resultSchema).toBeDefined(); | ||
| expect(OlostepBubble.shortDescription).toContain('Web scraping'); | ||
| expect(OlostepBubble.longDescription).toContain('Scrape'); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // CREDENTIAL VALIDATION | ||
| // | ||
| describe('Credential Validation', () => { | ||
| it('should fail testCredential() with missing credentials', async () => { | ||
| const bubble = new OlostepBubble({ | ||
| operation: 'scrape', | ||
| url: 'https://example.com', | ||
| }); | ||
|
|
||
| const result = await bubble.testCredential(); | ||
| expect(result).toBe(false); | ||
| }); | ||
|
|
||
| it('should pass testCredential() with valid credentials', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => JSON.stringify({ markdown_content: '# Test' }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'scrape', | ||
| url: 'https://example.com', | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const result = await bubble.testCredential(); | ||
| expect(result).toBe(true); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // SCRAPE OPERATION | ||
| // | ||
| describe('Scrape Operation', () => { | ||
| it('should create bubble with scrape operation', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'scrape', | ||
| url: 'https://example.com', | ||
| formats: ['markdown'], | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.operation).toBe('scrape'); | ||
| expect((bubble as any).params.url).toBe('https://example.com'); | ||
| }); | ||
|
|
||
| it('should accept all scrape optional parameters', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'scrape', | ||
| url: 'https://example.com', | ||
| formats: ['markdown', 'html'], | ||
| country: 'US', | ||
| wait_before_scraping: 2000, | ||
| parser: '@olostep/product-page', | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.formats).toEqual(['markdown', 'html']); | ||
| expect((bubble as any).params.country).toBe('US'); | ||
| expect((bubble as any).params.wait_before_scraping).toBe(2000); | ||
| expect((bubble as any).params.parser).toBe('@olostep/product-page'); | ||
| }); | ||
|
|
||
| it('should return success for scrape', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => | ||
| JSON.stringify({ | ||
| markdown_content: '# Test Page', | ||
| metadata: { title: 'Test', url: 'https://example.com' }, | ||
| }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'scrape', | ||
| url: 'https://example.com', | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const res = await bubble.action(); | ||
|
|
||
| expect(res.data.operation).toBe('scrape'); | ||
| expect(res.success).toBe(true); | ||
| expect(res.error).toBe(''); | ||
| expect(res.data.markdown_content).toBeDefined(); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // BATCH OPERATION | ||
| // | ||
| describe('Batch Operation', () => { | ||
| it('should create bubble with batch operation', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'batch', | ||
| urls: ['https://example.com/1', 'https://example.com/2'], | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.operation).toBe('batch'); | ||
| expect((bubble as any).params.urls).toHaveLength(2); | ||
| }); | ||
|
|
||
| it('should return success for batch', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => | ||
| JSON.stringify({ | ||
| batch_id: 'batch_123', | ||
| status: 'processing', | ||
| }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'batch', | ||
| urls: ['https://example.com/1', 'https://example.com/2'], | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const res = await bubble.action(); | ||
|
|
||
| expect(res.data.operation).toBe('batch'); | ||
| expect(res.success).toBe(true); | ||
| expect(res.data.batch_id).toBeDefined(); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // CRAWL OPERATION | ||
| // | ||
| describe('Crawl Operation', () => { | ||
| it('should create bubble with crawl operation', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'crawl', | ||
| start_url: 'https://example.com', | ||
| max_pages: 50, | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.operation).toBe('crawl'); | ||
| expect((bubble as any).params.start_url).toBe('https://example.com'); | ||
| expect((bubble as any).params.max_pages).toBe(50); | ||
| }); | ||
|
|
||
| it('should return success for crawl', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => | ||
| JSON.stringify({ | ||
| crawl_id: 'crawl_123', | ||
| status: 'completed', | ||
| pages_crawled: 10, | ||
| }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'crawl', | ||
| start_url: 'https://example.com', | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const res = await bubble.action(); | ||
|
|
||
| expect(res.data.operation).toBe('crawl'); | ||
| expect(res.success).toBe(true); | ||
| expect(res.data.crawl_id).toBeDefined(); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // MAP OPERATION | ||
| // | ||
| describe('Map Operation', () => { | ||
| it('should create bubble with map operation', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'map', | ||
| url: 'https://example.com', | ||
| top_n: 200, | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.operation).toBe('map'); | ||
| expect((bubble as any).params.url).toBe('https://example.com'); | ||
| expect((bubble as any).params.top_n).toBe(200); | ||
| }); | ||
|
|
||
| it('should return success for map', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => | ||
| JSON.stringify({ | ||
| urls: ['https://example.com/page1', 'https://example.com/page2'], | ||
| total_urls: 2, | ||
| }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'map', | ||
| url: 'https://example.com', | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const res = await bubble.action(); | ||
|
|
||
| expect(res.data.operation).toBe('map'); | ||
| expect(res.success).toBe(true); | ||
| expect(res.data.urls).toBeDefined(); | ||
| expect(res.data.urls!.length).toBeGreaterThan(0); | ||
| }); | ||
| }); | ||
|
|
||
| // | ||
| // ANSWER OPERATION | ||
| // | ||
| describe('Answer Operation', () => { | ||
| it('should create bubble with answer operation', () => { | ||
| const params: OlostepParamsInput = { | ||
| operation: 'answer', | ||
| task: 'What is the main topic of this website?', | ||
| context_urls: ['https://example.com'], | ||
| }; | ||
|
|
||
| const bubble = new OlostepBubble(params); | ||
| expect((bubble as any).params.operation).toBe('answer'); | ||
| expect((bubble as any).params.task).toContain('main topic'); | ||
| }); | ||
|
|
||
| it('should return success for answer', async () => { | ||
| mockFetch.mockResolvedValueOnce({ | ||
| ok: true, | ||
| text: async () => | ||
| JSON.stringify({ | ||
| answer: 'This website is about example content.', | ||
| citations: [{ url: 'https://example.com', title: 'Example' }], | ||
| sources_used: 1, | ||
| }), | ||
| }); | ||
|
|
||
| const bubble = new OlostepBubble({ | ||
| operation: 'answer', | ||
| task: 'What is this website about?', | ||
| credentials: createTestCredentials(), | ||
| }); | ||
|
|
||
| const res = await bubble.action(); | ||
|
|
||
| expect(res.data.operation).toBe('answer'); | ||
| expect(res.success).toBe(true); | ||
| expect(res.data.answer).toBeDefined(); | ||
| }); | ||
| }); | ||
| }); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test suite is missing error handling tests. Consider adding tests for:
- API request failures (non-2xx responses)
- Network errors (fetch throws exception)
- Invalid API key responses
- Malformed response data
Other service bubbles like ElevenLabsBubble include these types of tests to ensure robust error handling.
| const apiKey = this.chooseCredential(); | ||
| if (!apiKey) return false; | ||
|
|
||
| try { | ||
| // Simple health check with minimal scrape | ||
| const response = await fetch(`${OLOSTEP_API_URL}/scrapes`, { | ||
| method: 'POST', | ||
| headers: { | ||
| Authorization: `Bearer ${apiKey}`, | ||
| 'Content-Type': 'application/json', | ||
| }, | ||
| body: JSON.stringify({ | ||
| url_to_scrape: 'https://example.com', | ||
| formats: ['text'], | ||
| }), | ||
| }); | ||
| return response.ok; | ||
| } catch { | ||
| return false; | ||
| } | ||
| } | ||
|
|
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testCredential method makes a full scrape request to example.com, which could be slow and might consume API credits. Consider using a lighter-weight endpoint for credential validation, such as a health check or user info endpoint if available from the Olostep API. If no such endpoint exists, this approach is acceptable but may result in slower credential validation and unnecessary API usage.
| return { operation: 'scrape', ...base }; | ||
| } | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default case returns an operation type of 'scrape' which could be misleading if an unknown operation somehow reaches this point. While TypeScript's type system should prevent this, consider either throwing an error or using a type assertion to indicate this is unreachable. For example: throw new Error('Unreachable: invalid operation type') or using operation as 'scrape' with a comment explaining this is a fallback that should never occur.
…tent extraction - Add OlostepBubble with 5 operations: scrape, batch, crawl, map, answer - Add OLOSTEP_API_KEY credential type and configuration - Add 'olostep' to BubbleName type - Register bubble in BubbleFactory - Export from bubble-core index - Add comprehensive unit tests - Add credential UI configuration in bubble-studio
a67e345 to
a29b5d3
Compare
…tent extraction
Summary
Adds Olostep integration to BubbleLab - web scraping and AI-powered content extraction with 5 operations: scrape, batch, crawl, map, and answer.
Type of Change
Checklist
pnpm checkand all tests pass (pre-existing lint errors in bubblelab-api unrelated to this PR)Screenshots (if applicable)
Additional Context