IMDB Sentiment Analysis (Logistic regression and transformers method)

Project Overview

This project is a submission for the Fellowship.ai Cohort 33 challenge. It implements sentiment analysis on the IMDB dataset using multiple approaches to demonstrate proficiency in natural language processing and machine learning.

Challenge Description

The challenge involves building a sentiment analysis system that can effectively classify movie reviews as positive or negative, showcasing:

Data preprocessing capabilities
Feature engineering skills
Model implementation and evaluation
Code organization and documentation
Use of modern NLP techniques (SpaCy and Transformers)

Technical Implementation

Data preprocessing using SpaCy with GPU acceleration
TF-IDF vectorization for feature extraction
Logistic Regression for baseline classification
Transformer-based models for advanced sentiment analysis
Comprehensive visualization and evaluation metrics

Requirements

Python 3.x
Google Colab (for original notebook execution)
Required packages:
- pandas
- spacy
- scikit-learn
- seaborn
- matplotlib
- transformers
- kaggle
- tqdm

Key Features

Advanced Preprocessing

def preprocess_text(text, nlp):
    text = clean_text(text)
    doc = nlp(text)
    tokens = [token.lemma_ for token in doc if token.is_alpha and len(token.text) > 2]
    return ' '.join(tokens)

GPU Acceleration

Utilizes SpaCy's GPU capabilities for faster processing
Optimized for Google Colab's GPU environment

Model Pipeline

Data cleaning and preprocessing
Feature extraction using TF-IDF
Model training with Logistic Regression
Performance evaluation and visualization
Advanced sentiment analysis using Transformers

Running the Project

Open the notebook in Google Colab
Upload your Kaggle credentials
Run all cells sequentially
Review the visualizations and performance metrics

Implementation Details

Uses SpaCy's en_core_web_sm model for preprocessing
Implements TF-IDF vectorization with:
- max_features: 50,000
- ngram_range: (1, 2)
Logistic Regression parameters:
- C: 1.0
- max_iterations: 1000
- n_jobs: -1 (parallel processing)

Results and Visualization

The project provides:

Classification metrics
Confusion matrix visualization
Review length distribution analysis
Sample predictions using transformer models

Future Improvements

Implement cross-validation
Add more advanced preprocessing techniques
Experiment with different transformer architectures
Add model comparison metrics
Implement model serialization

Notes

This project was developed as part of the Fellowship.ai Cohort 33 application process
Originally developed in Google Colab for GPU acceleration
Focuses on demonstrating both traditional ML and modern NLP approaches

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
fellowship_sentiment_analysis_challenge_basab.ipynb		fellowship_sentiment_analysis_challenge_basab.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IMDB Sentiment Analysis (Logistic regression and transformers method)

Project Overview

Challenge Description

Technical Implementation

Requirements

Key Features

Advanced Preprocessing

GPU Acceleration

Model Pipeline

Running the Project

Implementation Details

Results and Visualization

Future Improvements

Notes

About

Uh oh!

Releases

Packages

Languages

comethrusws/Sentiment_Analysis_nlp

Folders and files

Latest commit

History

Repository files navigation

IMDB Sentiment Analysis (Logistic regression and transformers method)

Project Overview

Challenge Description

Technical Implementation

Requirements

Key Features

Advanced Preprocessing

GPU Acceleration

Model Pipeline

Running the Project

Implementation Details

Results and Visualization

Future Improvements

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages