Home
Parliament Scraper
A comprehensive Laravel-based web scraper for extracting parliament member, committee, and bill information from parliament.bg with AI-powered analysis capabilities.
🚀 Key Features
🏛️ Parliament Data
Extract detailed information about parliament members, committees, and their relationships
📄 Bills Tracking
Scrape and monitor legislative bills with PDF text extraction and automated analysis
🎥 Video Transcription
AI-powered transcription of committee meeting videos using ElevenLabs Speech-to-Text
📜 Transcript Analysis
Automated analysis of meeting transcripts for bill discussions and amendments
🔍 Protocol Extraction
Extract structured protocol changes using advanced language processing
📊 Data Export
Export data to CSV, JSON, HTML formats with proper Bulgarian text support
🔧 Quick Start
Installation
# Clone the repository
git clone <repository-url>
cd parliament-scraper
# Install dependencies
composer install
npm install
# Set up environment
cp .env.example .env
php artisan key:generate
# Run migrations
php artisan migrate
Basic Usage
# Scrape parliament members
php artisan parliament:scrape
# Scrape committees
php artisan committees:scrape
# Scrape bills for a committee
php artisan bills:scrape --committee-id=3613
# Transcribe meeting videos
php artisan videos:transcribe-v2 --committee=3613 --since=2025-01-01
📖 Documentation
- Features Overview - Detailed feature descriptions
- Installation Guide - Complete setup instructions
- Usage Examples - Command examples and workflows
- API Reference - Parliament.bg API documentation
🎯 Use Cases
- Civic Monitoring: Track parliamentary activities and legislative processes
- Research Projects: Analyze voting patterns and bill discussions
- Transparency Initiatives: Make parliamentary data more accessible
- Academic Studies: Research political discourse and decision-making
- Journalism: Investigate legislative trends and political activities
🌍 Bulgarian Language Support
Full support for Bulgarian text with proper UTF-8 encoding and transliteration:
- Excel-compatible CSV exports with BOM encoding
- Character mapping for safe filenames
- Comprehensive text extraction from PDF documents
🤖 AI-Powered Features
- Speech-to-Text: Convert meeting videos to searchable text
- Content Analysis: Identify bill discussions and amendments
- Speaker Diarization: Separate and identify different speakers
- Protocol Extraction: Structure unorganized meeting transcripts
📊 Data Coverage
The system covers:
- 50+ Parliamentary Committees
- 1000+ Parliament Members
- Legislative Bills with full text
- Meeting Transcripts and videos
- Historical Data going back to 2021