Skip to content

srdanirz/imagetotext-solver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImageToText Solver

Self-hosted OCR API built with Tesseract.js to solve ImageToText captchas efficiently and cost-effectively.

Captcha Examples

🇪🇸 Versión en Español | English


Why This Exists

This project was created as an affordable alternative to commercial captcha-solving services, specifically for ImageToText captchas (text in images).

Captcha Types

The Problem:

  • Third-party services charge per captcha solved (~$0.001-$0.003 per captcha)
  • Costs scale rapidly with high request volumes (thousands of captchas/day)
  • Critical external dependency in your infrastructure
  • You only need to solve simple text captchas, not complex reCAPTCHAs

The Solution:

  • Self-hosted API with Tesseract.js for ImageToText captchas
  • Fixed cost (only hosting ~$7/month)
  • No request limits beyond configured rate limiting
  • Full control over infrastructure
  • Processes base64 images and returns filtered text

Result: ~90% cost reduction while maintaining the same functionality.

Supported Captcha Types

This solver is optimized for ImageToText captchas:

  • ✅ Simple text in images (letters and numbers)
  • ✅ Basic alphanumeric captchas
  • ✅ Captchas with solid color backgrounds
  • ❌ reCAPTCHA (requires specialized services)
  • ❌ hCaptcha (requires specialized services)
  • ❌ Captchas with severe distortion

How It Works

The solver processes captcha images in several steps:

1. Processing with Tesseract.js

The image is sent as base64 and Tesseract.js extracts the text:

Before filtering (raw text):

Before Filter

OCR may detect extra characters, spaces, or unwanted symbols.

After filtering (clean text):

After Filter

Automatic filtering removes everything except A-Z and 0-9, leaving only the captcha code.

2. Solver in Action

Example 1: Simple captcha

Solver Example 1

Example 2: Captcha with background

Solver Example 2

The API returns the filtered text ready to use:

{
  "text": "ABC123",
  "raw": "  A B C 1 2 3  ",
  "confidence": "high"
}

Features

  • ✅ Fast OCR with Tesseract.js
  • ✅ Simple REST API (POST endpoint)
  • ✅ Built-in rate limiting (100 req/15min per IP)
  • ✅ Automatic non-alphanumeric character filtering
  • ✅ Heroku deployment ready
  • ✅ Base64 image support

Tech Stack

  • Runtime: Node.js
  • Framework: Express.js
  • OCR Engine: Tesseract.js (based on Google's Tesseract)
  • Rate Limiting: express-rate-limit
  • Deploy: Heroku compatible (Procfile included)

Installation

1. Clone the repository

git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solver

2. Install dependencies

npm install

3. Configure environment variables (optional)

cp .env.example .env

Edit .env:

PORT=3000
RATE_LIMIT_WINDOW_MS=900000  # 15 minutes
RATE_LIMIT_MAX=100           # Max requests per window

4. Run the server

npm start

The server will be available at http://localhost:3000


API Usage

Endpoint: POST /ocr

Request:

curl -X POST http://localhost:3000/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "imageData": "base64_encoded_image_here"
  }'

Request Body:

{
  "imageData": "iVBORw0KGgoAAAANSUhEUgAA..."
}

Response (Success):

{
  "text": "ABCD1234",
  "raw": "  A B C D 1 2 3 4  ",
  "confidence": "high"
}

Response (Error):

{
  "error": "OCR processing failed",
  "message": "Error processing image"
}

Endpoint: GET /health

Health check endpoint for monitoring.

Response:

{
  "status": "ok",
  "uptime": 12345.67,
  "timestamp": "2025-11-01T19:30:00.000Z"
}

Usage Examples

JavaScript (Node.js)

const fs = require('fs');
const axios = require('axios');

// Read image and convert to base64
const imageBuffer = fs.readFileSync('captcha.png');
const base64Image = imageBuffer.toString('base64');

// Make request to OCR server
axios.post('http://localhost:3000/ocr', {
  imageData: base64Image
})
.then(response => {
  console.log('Recognized text:', response.data.text);
})
.catch(error => {
  console.error('Error:', error.message);
});

Python

import base64
import requests

# Read image and convert to base64
with open('captcha.png', 'rb') as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

# Make request to OCR server
response = requests.post('http://localhost:3000/ocr', json={
    'imageData': base64_image
})

if response.status_code == 200:
    print('Recognized text:', response.json()['text'])
else:
    print('Error:', response.json()['error'])

Deploy to Heroku

This project is Heroku-ready:

# Login to Heroku
heroku login

# Create app
heroku create my-ocr-server

# Deploy
git push heroku main

# Check logs
heroku logs --tail

Environment variables on Heroku:

heroku config:set RATE_LIMIT_MAX=200
heroku config:set RATE_LIMIT_WINDOW_MS=900000

Comparison: Self-hosted vs Commercial Services

Feature ImageToText Solver (Self-hosted) Commercial Services
Cost Free (only hosting ~$7/month) ~$0.001-$0.003 per captcha
Scalability Unlimited (depends on your server) Unlimited (pay per use)
Latency Low (local/VPS) Medium (external API)
Accuracy 80-95% (simple captchas) 95-99% (all types)
Maintenance You maintain it No maintenance
Dependency Self-hosted Third-party service
Types supported Text-only All (reCAPTCHA, hCaptcha, etc.)

When to use ImageToText Solver:

  • ✅ Simple text captchas
  • ✅ High request volume
  • ✅ Limited budget
  • ✅ Need full control

When to use commercial services:

  • ✅ Complex captchas (reCAPTCHA, hCaptcha, FunCaptcha)
  • ✅ Low volume (few requests per day)
  • ✅ Need guaranteed maximum accuracy
  • ✅ Don't want to maintain infrastructure

Real-World Use Cases

This solver has been used in production for:

  • ✅ Testing and development of captcha systems
  • ✅ Security research and penetration testing
  • ✅ Automated testing of web applications
  • ✅ OCR text extraction for data processing

Real savings in use cases: From ~$500/month with third-party services to ~$7/month on Heroku Dyno (hosting).


Contributing

Contributions are welcome:

  1. Fork the project
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - Use this code however you want.


Credits

  • OCR Engine: Tesseract.js - JavaScript port of Google's Tesseract
  • Developed by: srdanirz
  • Context: Created as an affordable alternative to commercial captcha-solving services

Disclaimer

This project is intended for educational purposes, security research, and testing of systems you own or have authorization to test. Users are responsible for ensuring compliance with applicable laws and terms of service.



Versión en Español

API de reconocimiento óptico de caracteres (OCR) auto-hospedada construida con Tesseract.js para resolver captchas de tipo ImageToText de manera eficiente y económica.

Por Qué Existe Esto

Este proyecto nació como una alternativa económica a servicios comerciales de resolución de captchas, específicamente para captchas de tipo ImageToText (texto en imágenes).

El Problema:

  • Servicios de terceros cobran por cada captcha resuelto (~$0.001-$0.003 por captcha)
  • Los costos escalan rápidamente con alto volumen de requests (miles de captchas/día)
  • Dependencia externa crítica en la infraestructura
  • Solo necesitas resolver captchas simples de texto, no reCAPTCHA complejos

La Solución:

  • API self-hosted con Tesseract.js para captchas ImageToText
  • Costo fijo (solo hosting ~$7/mes)
  • Sin límites de requests más allá del rate limiting configurado
  • Control total sobre la infraestructura
  • Procesa imágenes base64 y devuelve texto filtrado

Resultado: Reducción de ~90% en costos de resolución de captchas manteniendo la misma funcionalidad.

Tipos de Captchas Soportados

Este solver está optimizado para captchas de tipo ImageToText:

  • ✅ Texto simple en imágenes (letras y números)
  • ✅ Captchas alfanuméricos básicos
  • ✅ Captchas con fondo de color sólido
  • ❌ reCAPTCHA (requiere servicios especializados)
  • ❌ hCaptcha (requiere servicios especializados)
  • ❌ Captchas con distorsión severa

Características

  • ✅ OCR rápido con Tesseract.js
  • ✅ API REST simple (POST endpoint)
  • ✅ Rate limiting integrado (100 req/15min por IP)
  • ✅ Filtrado automático de caracteres no alfanuméricos
  • ✅ Deploy ready para Heroku
  • ✅ Soporte para imágenes base64

Instalación

1. Clonar el repositorio

git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solver

2. Instalar dependencias

npm install

3. Configurar variables de entorno (opcional)

cp .env.example .env

Edita .env:

PORT=3000
RATE_LIMIT_WINDOW_MS=900000  # 15 minutos
RATE_LIMIT_MAX=100           # Max requests por ventana

4. Ejecutar el servidor

npm start

El servidor estará disponible en http://localhost:3000

Uso del API

Endpoint: POST /ocr

Request:

curl -X POST http://localhost:3000/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "imageData": "imagen_en_base64_aquí"
  }'

Response (Éxito):

{
  "text": "ABCD1234",
  "raw": "  A B C D 1 2 3 4  ",
  "confidence": "high"
}

Deploy en Heroku

Este proyecto está listo para Heroku:

# Login a Heroku
heroku login

# Crear app
heroku create mi-servidor-ocr

# Deploy
git push heroku main

# Ver logs
heroku logs --tail

Casos de Uso Reales

Este solver fue usado en producción para:

  • ✅ Testing y desarrollo de sistemas de captcha
  • ✅ Investigación de seguridad y pruebas de penetración
  • ✅ Testing automatizado de aplicaciones web
  • ✅ Extracción de texto OCR para procesamiento de datos

Ahorro real en casos de uso: De ~$500/mes en servicios de terceros a ~$7/mes en Heroku Dyno (hosting).

Licencia

MIT License - Usa este código como quieras.

Créditos

  • Motor OCR: Tesseract.js
  • Desarrollado por: srdanirz
  • Contexto: Creado como alternativa económica a servicios comerciales de resolución de captchas

Packages

No packages published