Features
Features Overview
Parliament Scraper provides comprehensive tools for extracting, processing, and analyzing Bulgarian parliamentary data.
🏛️ Parliament Members Management
Data Extraction
- Complete Member Profiles: Extract all parliament members with detailed information
- Electoral Districts: Track representation by geographic regions
- Political Affiliations: Monitor party memberships and changes
- Professional Backgrounds: Capture member professions and expertise
- Contact Information: Official parliament email addresses
Member Data Includes
- Full names (first, middle, last)
- Electoral district representation
- Political party affiliation
- Professional background
- Official parliament email
- Unique parliament member ID
🏢 Committee Management
Committee Structure
- All Parliamentary Committees: Extract complete committee information
- Member Relationships: Track committee memberships and positions
- Leadership Positions: Identify chairpersons and deputy chairs
- Term Tracking: Monitor membership periods and changes
- Contact Details: Committee email and phone information
Committee Data Includes
- Committee unique identifiers
- Committee names and types
- Active member counts
- Activity periods (date from/to)
- Contact information (email, phone, room)
- Member positions and terms
📄 Legislative Bills Tracking
Bill Information
- Complete Bill Database: Extract all bills by committee
- PDF Document Processing: Download and extract text from bill PDFs
- Submitter Information: Track who submitted each bill
- Committee Assignments: Monitor which committees handle which bills
- Status Tracking: Monitor bill progress and withdrawals
Bill Data Includes
- Unique bill identifiers
- Official bill signatures and numbers
- Bill titles in Bulgarian
- Submission dates
- PDF document URLs and extracted text
- Word and character counts
- Submitter information (JSON)
- Committee assignments
- Withdrawal status
Advanced Bill Features
- Automatic PDF Processing: Extract text content from bill PDFs
- Detailed Fetching: Optional detailed information extraction
- Scheduled Monitoring: Check for new bills automatically
- Committee Filtering: Process bills for specific committees
🎥 Video Transcription System
Direct URL Processing
- No Downloads Required: Process videos directly from parliament.bg URLs
- Multiple Video Formats: Handle various video file types and structures
- Smart Meeting Discovery: Automatically find meetings with available videos
- Batch Processing: Process multiple videos concurrently
AI-Powered Transcription
- ElevenLabs Integration: Professional-grade speech-to-text API
- Multiple Models: Support for various language models
- Language Detection: Automatic language identification with confidence scores
- Speaker Diarization: Identify and separate different speakers
- Word-Level Timestamps: Precise timing for each word
Transcription Features
- Process videos from specific committees or meetings
- Filter by date ranges (year, month, custom dates)
- Resume capability (skip already transcribed videos)
- Cost tracking and estimation
- Real-time progress monitoring
- Comprehensive error handling
📜 Transcript Processing
Meeting Transcripts
- Complete Transcript Database: Extract all committee meeting transcripts
- HTML Content Processing: Handle complex HTML formatting
- Metadata Extraction: Capture meeting dates, types, and participants
- Bulk Processing: Efficiently process multiple months/years
- Download Status Tracking: Monitor which transcripts have been downloaded
Transcript Features
- Interactive committee selection
- Entire year processing (all 12 months automatically)
- Beautiful table views with status indicators
- CSV export functionality
- Download grouping for efficiency
🤖 AI-Powered Analysis
Content Analysis
- Bill Discussion Detection: Identify when specific bills are discussed
- Amendment Tracking: Monitor proposed changes and modifications
- Speaker Identification: Track who proposed what changes
- Confidence Scoring: AI confidence levels for each analysis
- Status Classification: Categorize discussion outcomes (accepted, rejected, debated)
Analysis Data Includes
- Related transcript and bill IDs
- Bill identifiers mentioned in discussions
- Proposer names and roles
- Amendment types (new text, modification, deletion)
- Discussion status and outcomes
- AI confidence scores (0-1 scale)
- Relevant content excerpts
- Generated summaries
🔍 Protocol Extraction
Structured Data Extraction
- LangExtract Integration: Advanced language processing for protocol structure
- Multiple Extraction Types: Bill discussions, committee decisions, amendments, speaker statements
- Confidence Scoring: Quality assessment for extractions
- Flexible Processing: Process specific transcripts or date ranges
- Python Integration: Seamless integration with Python extraction tools
Extraction Types
- Bill discussions and debates
- Committee decisions and votes
- Amendment proposals and changes
- Individual speaker statements
- Complete protocol structures
📊 Data Export & Formats
Export Formats
- CSV: Excel-compatible with UTF-8 BOM encoding
- JSON: Structured data for programmatic access
- HTML: Formatted output for web viewing
- TXT: Human-readable plain text format
Export Features
- Bulgarian Text Support: Proper encoding for Cyrillic characters
- Individual Committee Files: Separate exports per committee
- Date Range Filtering: Export specific time periods
- Metadata Inclusion: Optional metadata and analysis inclusion
- Custom Output Directories: Organize exports efficiently
Specialized Exports
- Parliament members with full profiles
- Committee structures and memberships
- Legislative bills with extracted text
- Meeting transcripts with analysis
- Individual committee files with formatting
🔔 Monitoring & Automation
Scheduled Monitoring
- New Bill Detection: Automatically check for newly submitted bills
- Transcript Monitoring: Monitor for new committee meeting transcripts
- Notification Support: Alert systems for new content
- Configurable Periods: Set custom monitoring timeframes
Automation Features
- Scheduled command execution
- Automatic analysis triggers
- Error handling and recovery
- Progress tracking and reporting
- Resource usage optimization
🌍 Bulgarian Language Support
Text Processing
- UTF-8 Encoding: Full support for Bulgarian Cyrillic characters
- BOM Encoding: Excel-compatible CSV files
- Character Transliteration: Safe filename generation (а→a, б→b, etc.)
- Word Count Accuracy: Proper counting for Bulgarian text
- Search Functionality: Bulgarian text search and filtering
Internationalization
- Bulgarian interface elements
- Proper date formatting
- Currency and number formatting
- Cultural considerations in data processing
🔧 Technical Features
Performance Optimization
- Batch Processing: Efficient handling of large datasets
- Progress Tracking: Real-time progress bars and statistics
- Memory Management: Optimized for large file processing
- Concurrent Operations: Parallel processing where possible
- Resume Capability: Restart interrupted operations
Error Handling
- Robust Error Recovery: Graceful handling of API failures
- Detailed Logging: Comprehensive error reporting
- Retry Logic: Automatic retries with exponential backoff
- Validation: Input and data validation throughout
- Timeout Management: Appropriate timeouts for network operations
Database Features
- Eloquent ORM: Clean data access patterns
- Relationship Management: Complex many-to-many relationships
- Migration System: Version-controlled database changes
- Indexing Strategy: Optimized database performance
- Data Integrity: Foreign key constraints and validation
📋 Command Summary
Core Scraping Commands
parliament:scrape
- Extract all parliament memberscommittees:scrape
- Extract committees and relationshipsbills:scrape
- Extract legislative bills with optionstranscripts:scrape
- Extract meeting transcriptsvideos:transcribe-v2
- Transcribe meeting videos
Analysis Commands
analyze:transcripts
- AI analysis of transcript contenttranscripts:extract
- Extract structured protocol data
Export Commands
parliament:export-csv
- Export members to CSVcommittees:export-csv
- Export committees to CSVbills:export-csv
- Export bills to CSVtranscripts:export-analysis
- Export for external analysistranscripts:export-committee
- Advanced committee exports
Utility Commands
transcripts:list
- Interactive transcript managementbills:check-new
- Monitor for new billstranscripts:check-new
- Monitor for new transcripts