ImageToText Solver

Self-hosted OCR API built with Tesseract.js to solve ImageToText captchas efficiently and cost-effectively.

Why This Exists

This project was created as an affordable alternative to commercial captcha-solving services, specifically for ImageToText captchas (text in images).

The Problem:

Third-party services charge per captcha solved (~$0.001-$0.003 per captcha)
Costs scale rapidly with high request volumes (thousands of captchas/day)
Critical external dependency in your infrastructure
You only need to solve simple text captchas, not complex reCAPTCHAs

The Solution:

Self-hosted API with Tesseract.js for ImageToText captchas
Fixed cost (only hosting ~$7/month)
No request limits beyond configured rate limiting
Full control over infrastructure
Processes base64 images and returns filtered text

Result: ~90% cost reduction while maintaining the same functionality.

Supported Captcha Types

This solver is optimized for ImageToText captchas:

✅ Simple text in images (letters and numbers)
✅ Basic alphanumeric captchas
✅ Captchas with solid color backgrounds
❌ reCAPTCHA (requires specialized services)
❌ hCaptcha (requires specialized services)
❌ Captchas with severe distortion

How It Works

The solver processes captcha images in several steps:

1. Processing with Tesseract.js

The image is sent as base64 and Tesseract.js extracts the text:

Before filtering (raw text):

OCR may detect extra characters, spaces, or unwanted symbols.

After filtering (clean text):

Automatic filtering removes everything except A-Z and 0-9, leaving only the captcha code.

2. Solver in Action

Example 1: Simple captcha

Example 2: Captcha with background

The API returns the filtered text ready to use:

{
  "text": "ABC123",
  "raw": "  A B C 1 2 3  ",
  "confidence": "high"
}

Features

✅ Fast OCR with Tesseract.js
✅ Simple REST API (POST endpoint)
✅ Built-in rate limiting (100 req/15min per IP)
✅ Automatic non-alphanumeric character filtering
✅ Heroku deployment ready
✅ Base64 image support

Tech Stack

Runtime: Node.js
Framework: Express.js
OCR Engine: Tesseract.js (based on Google's Tesseract)
Rate Limiting: express-rate-limit
Deploy: Heroku compatible (Procfile included)

Installation

1. Clone the repository

git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solver

2. Install dependencies

npm install

3. Configure environment variables (optional)

cp .env.example .env

Edit .env:

PORT=3000
RATE_LIMIT_WINDOW_MS=900000  # 15 minutes
RATE_LIMIT_MAX=100           # Max requests per window

4. Run the server

npm start

The server will be available at http://localhost:3000

API Usage

Endpoint: POST `/ocr`

Request:

curl -X POST http://localhost:3000/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "imageData": "base64_encoded_image_here"
  }'

Request Body:

{
  "imageData": "iVBORw0KGgoAAAANSUhEUgAA..."
}

Response (Success):

{
  "text": "ABCD1234",
  "raw": "  A B C D 1 2 3 4  ",
  "confidence": "high"
}

Response (Error):

{
  "error": "OCR processing failed",
  "message": "Error processing image"
}

Endpoint: GET `/health`

Health check endpoint for monitoring.

Response:

{
  "status": "ok",
  "uptime": 12345.67,
  "timestamp": "2025-11-01T19:30:00.000Z"
}

Usage Examples

JavaScript (Node.js)

const fs = require('fs');
const axios = require('axios');

// Read image and convert to base64
const imageBuffer = fs.readFileSync('captcha.png');
const base64Image = imageBuffer.toString('base64');

// Make request to OCR server
axios.post('http://localhost:3000/ocr', {
  imageData: base64Image
})
.then(response => {
  console.log('Recognized text:', response.data.text);
})
.catch(error => {
  console.error('Error:', error.message);
});

Python

import base64
import requests

# Read image and convert to base64
with open('captcha.png', 'rb') as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

# Make request to OCR server
response = requests.post('http://localhost:3000/ocr', json={
    'imageData': base64_image
})

if response.status_code == 200:
    print('Recognized text:', response.json()['text'])
else:
    print('Error:', response.json()['error'])

Deploy to Heroku

This project is Heroku-ready:

# Login to Heroku
heroku login

# Create app
heroku create my-ocr-server

# Deploy
git push heroku main

# Check logs
heroku logs --tail

Environment variables on Heroku:

heroku config:set RATE_LIMIT_MAX=200
heroku config:set RATE_LIMIT_WINDOW_MS=900000

Comparison: Self-hosted vs Commercial Services

Feature	ImageToText Solver (Self-hosted)	Commercial Services
Cost	Free (only hosting ~$7/month)	~$0.001-$0.003 per captcha
Scalability	Unlimited (depends on your server)	Unlimited (pay per use)
Latency	Low (local/VPS)	Medium (external API)
Accuracy	80-95% (simple captchas)	95-99% (all types)
Maintenance	You maintain it	No maintenance
Dependency	Self-hosted	Third-party service
Types supported	Text-only	All (reCAPTCHA, hCaptcha, etc.)

When to use ImageToText Solver:

✅ Simple text captchas
✅ High request volume
✅ Limited budget
✅ Need full control

When to use commercial services:

✅ Complex captchas (reCAPTCHA, hCaptcha, FunCaptcha)
✅ Low volume (few requests per day)
✅ Need guaranteed maximum accuracy
✅ Don't want to maintain infrastructure

Real-World Use Cases

This solver has been used in production for:

✅ Testing and development of captcha systems
✅ Security research and penetration testing
✅ Automated testing of web applications
✅ OCR text extraction for data processing

Real savings in use cases: From ~$500/month with third-party services to ~$7/month on Heroku Dyno (hosting).

Contributing

Contributions are welcome:

Fork the project
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - Use this code however you want.

Credits

OCR Engine: Tesseract.js - JavaScript port of Google's Tesseract
Developed by: srdanirz
Context: Created as an affordable alternative to commercial captcha-solving services

Disclaimer

This project is intended for educational purposes, security research, and testing of systems you own or have authorization to test. Users are responsible for ensuring compliance with applicable laws and terms of service.

Versión en Español

API de reconocimiento óptico de caracteres (OCR) auto-hospedada construida con Tesseract.js para resolver captchas de tipo ImageToText de manera eficiente y económica.

Por Qué Existe Esto

Este proyecto nació como una alternativa económica a servicios comerciales de resolución de captchas, específicamente para captchas de tipo ImageToText (texto en imágenes).

El Problema:

Servicios de terceros cobran por cada captcha resuelto (~$0.001-$0.003 por captcha)
Los costos escalan rápidamente con alto volumen de requests (miles de captchas/día)
Dependencia externa crítica en la infraestructura
Solo necesitas resolver captchas simples de texto, no reCAPTCHA complejos

La Solución:

API self-hosted con Tesseract.js para captchas ImageToText
Costo fijo (solo hosting ~$7/mes)
Sin límites de requests más allá del rate limiting configurado
Control total sobre la infraestructura
Procesa imágenes base64 y devuelve texto filtrado

Resultado: Reducción de ~90% en costos de resolución de captchas manteniendo la misma funcionalidad.

Tipos de Captchas Soportados

Este solver está optimizado para captchas de tipo ImageToText:

✅ Texto simple en imágenes (letras y números)
✅ Captchas alfanuméricos básicos
✅ Captchas con fondo de color sólido
❌ reCAPTCHA (requiere servicios especializados)
❌ hCaptcha (requiere servicios especializados)
❌ Captchas con distorsión severa

Características

✅ OCR rápido con Tesseract.js
✅ API REST simple (POST endpoint)
✅ Rate limiting integrado (100 req/15min por IP)
✅ Filtrado automático de caracteres no alfanuméricos
✅ Deploy ready para Heroku
✅ Soporte para imágenes base64

Instalación

1. Clonar el repositorio

git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solver

2. Instalar dependencias

npm install

3. Configurar variables de entorno (opcional)

cp .env.example .env

Edita .env:

PORT=3000
RATE_LIMIT_WINDOW_MS=900000  # 15 minutos
RATE_LIMIT_MAX=100           # Max requests por ventana

4. Ejecutar el servidor

npm start

El servidor estará disponible en http://localhost:3000

Uso del API

Endpoint: POST `/ocr`

Request:

curl -X POST http://localhost:3000/ocr \
  -H "Content-Type: application/json" \
  -d '{
    "imageData": "imagen_en_base64_aquí"
  }'

Response (Éxito):

{
  "text": "ABCD1234",
  "raw": "  A B C D 1 2 3 4  ",
  "confidence": "high"
}

Deploy en Heroku

Este proyecto está listo para Heroku:

# Login a Heroku
heroku login

# Crear app
heroku create mi-servidor-ocr

# Deploy
git push heroku main

# Ver logs
heroku logs --tail

Casos de Uso Reales

Este solver fue usado en producción para:

✅ Testing y desarrollo de sistemas de captcha
✅ Investigación de seguridad y pruebas de penetración
✅ Testing automatizado de aplicaciones web
✅ Extracción de texto OCR para procesamiento de datos

Ahorro real en casos de uso: De ~$500/mes en servicios de terceros a ~$7/mes en Heroku Dyno (hosting).

Licencia

MIT License - Usa este código como quieras.

Créditos

Motor OCR: Tesseract.js
Desarrollado por: srdanirz
Contexto: Creado como alternativa económica a servicios comerciales de resolución de captchas

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.js		app.js
eng.traineddata		eng.traineddata
package.json		package.json

License

srdanirz/imagetotext-solver

Folders and files

Latest commit

History

Repository files navigation

ImageToText Solver

Why This Exists

Supported Captcha Types

How It Works

1. Processing with Tesseract.js

2. Solver in Action

Features

Tech Stack

Installation

1. Clone the repository

2. Install dependencies

3. Configure environment variables (optional)

4. Run the server

API Usage

Endpoint: POST /ocr

Endpoint: GET /health

Usage Examples

JavaScript (Node.js)

Python

Deploy to Heroku

Comparison: Self-hosted vs Commercial Services

Real-World Use Cases

Contributing

License

Credits

Disclaimer

Versión en Español

Por Qué Existe Esto

Tipos de Captchas Soportados

Características

Instalación

1. Clonar el repositorio

2. Instalar dependencias

3. Configurar variables de entorno (opcional)

4. Ejecutar el servidor

Uso del API

Endpoint: POST /ocr

Deploy en Heroku

Casos de Uso Reales

Licencia

Créditos

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Endpoint: POST `/ocr`

Endpoint: GET `/health`

Endpoint: POST `/ocr`

Packages