Home

Parliament Scraper

A comprehensive Laravel-based web scraper for extracting parliament member, committee, and bill information from parliament.bg with AI-powered analysis capabilities.

🚀 Key Features

🏛️ Parliament Data

Extract detailed information about parliament members, committees, and their relationships

📄 Bills Tracking

Scrape and monitor legislative bills with PDF text extraction and automated analysis

🎥 Video Transcription

AI-powered transcription of committee meeting videos using ElevenLabs Speech-to-Text

📜 Transcript Analysis

Automated analysis of meeting transcripts for bill discussions and amendments

🔍 Protocol Extraction

Extract structured protocol changes using advanced language processing

📊 Data Export

Export data to CSV, JSON, HTML formats with proper Bulgarian text support

🔧 Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd parliament-scraper

# Install dependencies
composer install
npm install

# Set up environment
cp .env.example .env
php artisan key:generate

# Run migrations
php artisan migrate

Basic Usage

# Scrape parliament members
php artisan parliament:scrape

# Scrape committees
php artisan committees:scrape

# Scrape bills for a committee
php artisan bills:scrape --committee-id=3613

# Transcribe meeting videos
php artisan videos:transcribe-v2 --committee=3613 --since=2025-01-01

📖 Documentation

🎯 Use Cases

  • Civic Monitoring: Track parliamentary activities and legislative processes
  • Research Projects: Analyze voting patterns and bill discussions
  • Transparency Initiatives: Make parliamentary data more accessible
  • Academic Studies: Research political discourse and decision-making
  • Journalism: Investigate legislative trends and political activities

🌍 Bulgarian Language Support

Full support for Bulgarian text with proper UTF-8 encoding and transliteration:

  • Excel-compatible CSV exports with BOM encoding
  • Character mapping for safe filenames
  • Comprehensive text extraction from PDF documents

🤖 AI-Powered Features

  • Speech-to-Text: Convert meeting videos to searchable text
  • Content Analysis: Identify bill discussions and amendments
  • Speaker Diarization: Separate and identify different speakers
  • Protocol Extraction: Structure unorganized meeting transcripts

📊 Data Coverage

The system covers:

  • 50+ Parliamentary Committees
  • 1000+ Parliament Members
  • Legislative Bills with full text
  • Meeting Transcripts and videos
  • Historical Data going back to 2021