A sophisticated sentiment analysis web application that combines multiple AI models for accurate sentiment detection. The application uses TextBlob, VADER, and transformer models to provide comprehensive sentiment analysis with confidence scores and visualizations.
- Multi-Model Analysis: Combines TextBlob, VADER, and transformer models for ensemble predictions
- Single & Batch Processing: Analyze individual texts or multiple texts at once
- Interactive Web Interface: Modern, responsive UI with real-time results
- Data Visualizations: Pie charts and histograms for batch analysis results
- REST API: Full API endpoints for programmatic access
- Text Preprocessing: Automatic cleaning and preprocessing of input text
- Confidence Scoring: Provides confidence levels for all predictions
- TextBlob: Rule-based sentiment analysis with polarity and subjectivity scores
- VADER: Lexicon and rule-based sentiment analysis optimized for social media text
- Transformers: Deep learning models (RoBERTa/DistilBERT) for state-of-the-art accuracy
-
Clone or download the project files
-
Install Python dependencies:
pip install -r requirements.txt
-
Download NLTK data (automatically handled on first run):
import nltk nltk.download('punkt') nltk.download('stopwords')
python app.pyThe application will start on http://localhost:5000
-
Single Text Analysis:
- Enter text in the input field
- Click "Analyze Sentiment"
- View detailed results with model breakdown
-
Batch Analysis:
- Switch to "Batch Analysis" mode
- Enter multiple texts (one per line)
- Click "Batch Analyze"
- View summary statistics and visualizations
POST /analyze
Content-Type: application/json
{
"text": "I love this product! It's amazing."
}Response:
{
"original_text": "I love this product! It's amazing.",
"preprocessed_text": "I love this product! It's amazing.",
"final_sentiment": "positive",
"confidence": 0.85,
"sentiment_scores": {
"positive": 2.1,
"negative": 0.0,
"neutral": 0.2
},
"individual_results": {
"textblob": {
"sentiment": "positive",
"polarity": 0.625,
"subjectivity": 0.9,
"confidence": 0.625
},
"vader": {
"sentiment": "positive",
"compound": 0.6249,
"positive": 0.661,
"negative": 0.0,
"neutral": 0.339,
"confidence": 0.6249
},
"transformer": {
"sentiment": "positive",
"confidence": 0.9998,
"all_scores": [...]
}
}
}POST /batch_analyze
Content-Type: application/json
{
"texts": [
"I love this!",
"This is terrible.",
"It's okay, I guess."
]
}POST /visualize
Content-Type: application/json
{
"results": [/* array of analysis results */]
}GET /healthPOST /analyze_cdc
Content-Type: application/json
{
"area1_remarks": ["string", "string"],
"area2_remarks": ["string", "string"],
"area3_remarks": ["string", "string"],
"area4_remarks": ["string", "string"],
"area5_remarks": ["string", "string"],
"area6_remarks": ["string", "string"],
"area7_remarks": ["string", "string"]
}Response:
{
"results": [{
"cdc_id": 1,
"area1_remarks": ["string", "string"],
"area1_sentimental": ["Positive", "Positive"],
"area2_remarks": ["string", "string"],
"area2_sentimental": ["Positive", "Positive"],
"area3_remarks": ["string", "string"],
"area3_sentimental": ["Positive", "Positive"],
"area4_remarks": ["string", "string"],
"area4_sentimental": ["Positive", "Positive"],
"area5_remarks": ["string", "string"],
"area5_sentimental": ["Positive", "Positive"],
"area6_remarks": ["string", "string"],
"area6_sentimental": ["Positive", "Positive"],
"area7_remarks": ["string", "string"],
"area7_sentimental": ["Positive", "Positive"]
}],
"summary": {
"average_sentiment": "positive",
"average_confidence": 0.85,
"positive_count": 7,
"negative_count": 0,
"neutral_count": 0,
"total_texts": 7
}
}- URL removal
- User mention and hashtag removal
- Whitespace normalization
The application combines predictions from multiple models using weighted averaging based on confidence scores. The final sentiment is determined by the model with the highest confidence.
- TextBlob and VADER load instantly
- Transformer models are loaded asynchronously
- Fallback mechanisms ensure the app works even if transformer models fail to load
- Memory Usage: Transformer models require significant memory (~500MB-1GB)
- Processing Time: Single text analysis: <1s, Batch processing: varies by size
- Concurrent Users: Flask development server supports limited concurrency
For production use:
-
Use a production WSGI server:
gunicorn -w 4 -b 0.0.0.0:5000 app:app
-
Environment Variables:
export FLASK_ENV=production export FLASK_DEBUG=False
-
Resource Requirements:
- RAM: 2GB+ recommended
- CPU: 2+ cores for good performance
- Storage: 1GB+ for models
-
Transformer model fails to load:
- Check internet connection for model download
- Ensure sufficient memory is available
- App will fallback to TextBlob + VADER only
-
NLTK data missing:
- Run
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
- Run
-
Port already in use:
- Change port in
app.py:app.run(port=5001)
- Change port in
- Use GPU for transformer models (requires PyTorch GPU support)
- Implement caching for repeated analyses
- Use async processing for batch operations
This project is open source and available under the MIT License.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review the API documentation
- Create an issue with detailed error information