Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

#### 🚀 Global Installation Support
- **uv Script Integration**: Added inline script metadata for seamless dependency management
- **Global Command**: Can now be installed as `generate-llmstxt` command available from anywhere
- **Automatic Dependencies**: Uses uv to automatically manage Python dependencies
- **Cross-Directory Support**: Script works from any directory by finding its own .env file

#### 📋 Bulk URL Processing
- **Multi-URL Support**: Added `--urls-file` option to process multiple URLs from a text file
- **Consolidated Indexing**: Bulk mode generates a single master index instead of individual indexes
- **Smart File Organization**: Individual `_full.txt` files per URL with timestamped consolidated index
- **Timestamp Protection**: Consolidated index files include datetime stamps to prevent overwrites during retries
- **Comment Support**: URLs file supports `#` comments for organization

#### 🔄 Advanced Error Handling & Recovery
- **Failed URL Tracking**: Automatically tracks URLs that fail during processing
- **Retry File Generation**: Auto-generates `urls-failed.txt` for easy re-processing
- **Partial Success**: Continues processing remaining URLs even when some fail
- **Detailed Error Summary**: Shows specific failed URLs and retry commands in summary
- **Recovery Guidance**: Provides exact commands to retry failed URLs

#### 📝 Intelligent Filename Generation
- **Path-Based Naming**: Filenames include meaningful parts of the URL path
- **Extension Removal**: Automatically removes file extensions (`.md`, `.html`, `.php`, etc.)
- **Domain Cleanup**: Removes common domain extensions (`.com`, `.io`, `.org`, etc.)
- **Collision Prevention**: Unique filenames prevent overwrites when processing multiple URLs from same domain
- **Filesystem Safe**: Replaces special characters and limits filename length

#### 🎯 Enhanced User Experience
- **Progress Indicators**: Shows completion status for each URL during bulk processing
- **Visual Feedback**: Uses ✓ and ✗ symbols for success/failure status
- **Rate Limiting**: Automatic delays between URLs to prevent API rate limiting
- **Improved Logging**: Better structured logging with progress information

### Changed

#### 🔧 Argument Structure
- **Optional URL**: Main `url` argument is now optional when using `--urls-file`
- **Validation Logic**: Enhanced input validation for URL vs file processing modes
- **Error Messages**: More descriptive error messages for missing inputs

#### 📊 Output Behavior
- **Single vs Bulk Mode**: Different file generation strategies based on input type
- Single URL: Generates both `.txt` (index) and `_full.txt` (content)
- Bulk Mode: Generates individual `_full.txt` files + consolidated index
- **Filename Format**: Changed from domain-based to path-inclusive naming
- Old: `docs.anthropic.com-llms.txt`
- New: `docs_anthropic_claude_code_hooks.txt`

#### 🛠 Technical Improvements
- **Environment Loading**: Enhanced .env file discovery using script location
- **Dependency Management**: Moved from requirements.txt to inline script dependencies
- **Python Version**: Updated requirement to Python 3.13+ for uv script support
- **Error Resilience**: Better handling of API rate limits and network issues

### Technical Details

#### New Dependencies in Script Metadata
```python
# /// script
# requires-python = ">=3.13"
# dependencies = [
# "openai>=1.3.0",
# "python-dotenv>=1.0.0",
# "requests>=2.31.0",
# ]
# ///
```

#### New Command Line Options
- `--urls-file FILE`: Process multiple URLs from a file (one per line)
- `url` argument is now optional when using `--urls-file`

#### New Output Files
- `{domain}_consolidated_index_{timestamp}.txt`: Master index for bulk operations with timestamp
- `urls-failed.txt`: List of URLs that failed processing for easy retry
- Individual files now use format: `{domain}_{path_parts}_full.txt`

#### Enhanced Error Recovery Workflow
1. Run bulk processing: `generate-llmstxt --urls-file urls.txt`
2. If failures occur, script auto-generates `urls-failed.txt`
3. Retry failures: `generate-llmstxt --urls-file urls-failed.txt`

### Migration Guide

#### For Existing Users
- **No Breaking Changes**: All existing single-URL commands work unchanged
- **New Installation Method**: Consider switching to global uv installation for convenience
- **Filename Changes**: New installations will generate different filename formats (more descriptive)

#### Upgrading from Previous Version
1. Update script: `git pull origin main`
2. For global installation: Re-run the installation commands in README
3. Existing API key setup continues to work unchanged

### Examples

#### Before (Single URL Only)
```bash
python generate-llmstxt.py https://docs.example.com/page
# Generated: docs.example.com-llms.txt, docs.example.com-llms-full.txt
```

#### After (Enhanced Single URL)
```bash
generate-llmstxt https://docs.example.com/page
# Generated: docs_example_page.txt, docs_example_page_full.txt
```

#### New (Bulk Processing)
```bash
# Create URLs file
echo "https://docs.example.com/quickstart" >> urls.txt
echo "https://docs.example.com/api/reference" >> urls.txt

# Process all URLs
generate-llmstxt --urls-file urls.txt
# Generated:
# - docs_example_quickstart_full.txt
# - docs_example_api_reference_full.txt
# - docs_example_consolidated_index_20250103_143022.txt
# - urls-failed.txt (if any failures)
```

## [1.0.0] - 2024-01-01

### Added
- Initial release with basic single URL processing
- Firecrawl integration for website mapping and scraping
- OpenAI integration for content summarization
- Basic llms.txt and llms-full.txt generation
- Environment variable and .env file support
- Configurable URL limits and output directories
- Parallel processing with batch handling
- Basic error handling and logging
Loading