Self-hosted OCR API built with Tesseract.js to solve ImageToText captchas efficiently and cost-effectively.
🇪🇸 Versión en Español | English
This project was created as an affordable alternative to commercial captcha-solving services, specifically for ImageToText captchas (text in images).
The Problem:
- Third-party services charge per captcha solved (~$0.001-$0.003 per captcha)
- Costs scale rapidly with high request volumes (thousands of captchas/day)
- Critical external dependency in your infrastructure
- You only need to solve simple text captchas, not complex reCAPTCHAs
The Solution:
- Self-hosted API with Tesseract.js for ImageToText captchas
- Fixed cost (only hosting ~$7/month)
- No request limits beyond configured rate limiting
- Full control over infrastructure
- Processes base64 images and returns filtered text
Result: ~90% cost reduction while maintaining the same functionality.
This solver is optimized for ImageToText captchas:
- ✅ Simple text in images (letters and numbers)
- ✅ Basic alphanumeric captchas
- ✅ Captchas with solid color backgrounds
- ❌ reCAPTCHA (requires specialized services)
- ❌ hCaptcha (requires specialized services)
- ❌ Captchas with severe distortion
The solver processes captcha images in several steps:
The image is sent as base64 and Tesseract.js extracts the text:
Before filtering (raw text):
OCR may detect extra characters, spaces, or unwanted symbols.
After filtering (clean text):
Automatic filtering removes everything except A-Z and 0-9, leaving only the captcha code.
Example 1: Simple captcha
Example 2: Captcha with background
The API returns the filtered text ready to use:
{
"text": "ABC123",
"raw": " A B C 1 2 3 ",
"confidence": "high"
}- ✅ Fast OCR with Tesseract.js
- ✅ Simple REST API (POST endpoint)
- ✅ Built-in rate limiting (100 req/15min per IP)
- ✅ Automatic non-alphanumeric character filtering
- ✅ Heroku deployment ready
- ✅ Base64 image support
- Runtime: Node.js
- Framework: Express.js
- OCR Engine: Tesseract.js (based on Google's Tesseract)
- Rate Limiting: express-rate-limit
- Deploy: Heroku compatible (Procfile included)
git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solvernpm installcp .env.example .envEdit .env:
PORT=3000
RATE_LIMIT_WINDOW_MS=900000 # 15 minutes
RATE_LIMIT_MAX=100 # Max requests per windownpm startThe server will be available at http://localhost:3000
Request:
curl -X POST http://localhost:3000/ocr \
-H "Content-Type: application/json" \
-d '{
"imageData": "base64_encoded_image_here"
}'Request Body:
{
"imageData": "iVBORw0KGgoAAAANSUhEUgAA..."
}Response (Success):
{
"text": "ABCD1234",
"raw": " A B C D 1 2 3 4 ",
"confidence": "high"
}Response (Error):
{
"error": "OCR processing failed",
"message": "Error processing image"
}Health check endpoint for monitoring.
Response:
{
"status": "ok",
"uptime": 12345.67,
"timestamp": "2025-11-01T19:30:00.000Z"
}const fs = require('fs');
const axios = require('axios');
// Read image and convert to base64
const imageBuffer = fs.readFileSync('captcha.png');
const base64Image = imageBuffer.toString('base64');
// Make request to OCR server
axios.post('http://localhost:3000/ocr', {
imageData: base64Image
})
.then(response => {
console.log('Recognized text:', response.data.text);
})
.catch(error => {
console.error('Error:', error.message);
});import base64
import requests
# Read image and convert to base64
with open('captcha.png', 'rb') as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
# Make request to OCR server
response = requests.post('http://localhost:3000/ocr', json={
'imageData': base64_image
})
if response.status_code == 200:
print('Recognized text:', response.json()['text'])
else:
print('Error:', response.json()['error'])This project is Heroku-ready:
# Login to Heroku
heroku login
# Create app
heroku create my-ocr-server
# Deploy
git push heroku main
# Check logs
heroku logs --tailEnvironment variables on Heroku:
heroku config:set RATE_LIMIT_MAX=200
heroku config:set RATE_LIMIT_WINDOW_MS=900000| Feature | ImageToText Solver (Self-hosted) | Commercial Services |
|---|---|---|
| Cost | Free (only hosting ~$7/month) | ~$0.001-$0.003 per captcha |
| Scalability | Unlimited (depends on your server) | Unlimited (pay per use) |
| Latency | Low (local/VPS) | Medium (external API) |
| Accuracy | 80-95% (simple captchas) | 95-99% (all types) |
| Maintenance | You maintain it | No maintenance |
| Dependency | Self-hosted | Third-party service |
| Types supported | Text-only | All (reCAPTCHA, hCaptcha, etc.) |
When to use ImageToText Solver:
- ✅ Simple text captchas
- ✅ High request volume
- ✅ Limited budget
- ✅ Need full control
When to use commercial services:
- ✅ Complex captchas (reCAPTCHA, hCaptcha, FunCaptcha)
- ✅ Low volume (few requests per day)
- ✅ Need guaranteed maximum accuracy
- ✅ Don't want to maintain infrastructure
This solver has been used in production for:
- ✅ Testing and development of captcha systems
- ✅ Security research and penetration testing
- ✅ Automated testing of web applications
- ✅ OCR text extraction for data processing
Real savings in use cases: From ~$500/month with third-party services to ~$7/month on Heroku Dyno (hosting).
Contributions are welcome:
- Fork the project
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - Use this code however you want.
- OCR Engine: Tesseract.js - JavaScript port of Google's Tesseract
- Developed by: srdanirz
- Context: Created as an affordable alternative to commercial captcha-solving services
This project is intended for educational purposes, security research, and testing of systems you own or have authorization to test. Users are responsible for ensuring compliance with applicable laws and terms of service.
API de reconocimiento óptico de caracteres (OCR) auto-hospedada construida con Tesseract.js para resolver captchas de tipo ImageToText de manera eficiente y económica.
Este proyecto nació como una alternativa económica a servicios comerciales de resolución de captchas, específicamente para captchas de tipo ImageToText (texto en imágenes).
El Problema:
- Servicios de terceros cobran por cada captcha resuelto (~$0.001-$0.003 por captcha)
- Los costos escalan rápidamente con alto volumen de requests (miles de captchas/día)
- Dependencia externa crítica en la infraestructura
- Solo necesitas resolver captchas simples de texto, no reCAPTCHA complejos
La Solución:
- API self-hosted con Tesseract.js para captchas ImageToText
- Costo fijo (solo hosting ~$7/mes)
- Sin límites de requests más allá del rate limiting configurado
- Control total sobre la infraestructura
- Procesa imágenes base64 y devuelve texto filtrado
Resultado: Reducción de ~90% en costos de resolución de captchas manteniendo la misma funcionalidad.
Este solver está optimizado para captchas de tipo ImageToText:
- ✅ Texto simple en imágenes (letras y números)
- ✅ Captchas alfanuméricos básicos
- ✅ Captchas con fondo de color sólido
- ❌ reCAPTCHA (requiere servicios especializados)
- ❌ hCaptcha (requiere servicios especializados)
- ❌ Captchas con distorsión severa
- ✅ OCR rápido con Tesseract.js
- ✅ API REST simple (POST endpoint)
- ✅ Rate limiting integrado (100 req/15min por IP)
- ✅ Filtrado automático de caracteres no alfanuméricos
- ✅ Deploy ready para Heroku
- ✅ Soporte para imágenes base64
git clone https://github.com/srdanirz/imagetotext-solver.git
cd imagetotext-solvernpm installcp .env.example .envEdita .env:
PORT=3000
RATE_LIMIT_WINDOW_MS=900000 # 15 minutos
RATE_LIMIT_MAX=100 # Max requests por ventananpm startEl servidor estará disponible en http://localhost:3000
Request:
curl -X POST http://localhost:3000/ocr \
-H "Content-Type: application/json" \
-d '{
"imageData": "imagen_en_base64_aquí"
}'Response (Éxito):
{
"text": "ABCD1234",
"raw": " A B C D 1 2 3 4 ",
"confidence": "high"
}Este proyecto está listo para Heroku:
# Login a Heroku
heroku login
# Crear app
heroku create mi-servidor-ocr
# Deploy
git push heroku main
# Ver logs
heroku logs --tailEste solver fue usado en producción para:
- ✅ Testing y desarrollo de sistemas de captcha
- ✅ Investigación de seguridad y pruebas de penetración
- ✅ Testing automatizado de aplicaciones web
- ✅ Extracción de texto OCR para procesamiento de datos
Ahorro real en casos de uso: De ~$500/mes en servicios de terceros a ~$7/mes en Heroku Dyno (hosting).
MIT License - Usa este código como quieras.
- Motor OCR: Tesseract.js
- Desarrollado por: srdanirz
- Contexto: Creado como alternativa económica a servicios comerciales de resolución de captchas





