Configuration Guide
This guide provides detailed information about configuring the TSOC Data Analysis package for different power systems and use cases.
The TSOC Data Analysis package uses a centralized configuration system that allows users to customize all aspects of the analysis without modifying the source code. All configuration parameters are stored in system_configuration.py and organized into logical sections.
Configuration Areas
Data File Mappings - Excel file names and column prefixes
Data Validation Settings - Limits, thresholds, and validation rules
Representative Operations Parameters - Clustering and analysis settings
Plotting and Visualization Settings - Plot styles and formatting
Shared Utility Functions - Common utilities and helpers
System Configuration
The main configuration file system_configuration.py contains all configurable parameters:
File Structure:
# Data directory and file mappings
DATA_DIR = 'raw_data/'
FILES = {...}
COLUMN_PREFIXES = {...}
# Data validation settings
DATA_VALIDATION = {...}
ENHANCED_DATA_VALIDATION = {...}
# Representative operations settings
REPRESENTATIVE_OPS = {...}
# Plotting settings
PLOT_STYLE = 'seaborn-v0_8'
PLOT_PALETTE = 'husl'
FIGURE_SIZES = {...}
FONT_SIZES = {...}
Data File Configuration
File Mappings:
Configure the Excel file names for your power system:
FILES = {
'substation_mw': 'substation_active_power.xlsx',
'substation_mvar': 'substation_reactive_power.xlsx',
'wind_power': 'wind_farm_active_power.xlsx',
'shunt_elements': 'shunt_element_reactive_power.xlsx',
'gen_voltage': 'generator_voltage_setpoints.xlsx',
'gen_mvar': 'generator_reactive_power.xlsx'
}
Column Prefixes:
Define the column naming conventions:
COLUMN_PREFIXES = {
'substation_mw': 'ss_mw_',
'substation_mvar': 'ss_mvar_',
'wind_power': 'wind_mw_',
'shunt_elements': 'shunt_',
'gen_voltage': 'gen_v_',
'gen_mvar': 'gen_mvar_'
}
Customization Examples:
# For different file naming conventions
FILES['substation_mw'] = 'load_active_power.xlsx'
FILES['wind_power'] = 'renewable_generation.xlsx'
# For different column prefixes
COLUMN_PREFIXES['substation_mw'] = 'load_mw_'
COLUMN_PREFIXES['wind_power'] = 'renewable_mw_'
Data Validation Configuration
Basic Validation Settings:
DATA_VALIDATION = {
'type_checks': {
'real_numbers': ['ss_mw_', 'ss_mvar_', 'wind_mw_'],
'integers': ['shunt_tap_']
},
'limit_checks': {
'power_limits': {
'wind': {'min_mw': 0, 'max_mw': 100},
'substation': {'min_mw': -100, 'max_mw': 100}
}
},
'gap_filling': {
'max_gap_steps': 3,
'advanced_max_gap_steps': 12,
'remove_large_gaps_threshold': 24,
'enable_advanced_gap_filling': True
}
}
Enhanced Validation Settings:
ENHANCED_DATA_VALIDATION = {
'advanced_gap_filling': {
'enable_advanced_gap_filling': True,
'default_method': 'adaptive',
'context_size_ratio': 0.25,
'min_context_points': 10,
'adaptive_thresholds': {
'small_gap_size': 3,
'medium_gap_size': 6,
'large_gap_size': 12,
}
},
'outlier_detection': {
'default_methods': ['iqr', 'isolation_forest'],
'contamination': 0.1,
'zscore_threshold': 3.0,
'modified_zscore_threshold': 3.5,
'iqr_multiplier': 1.5,
},
'variable_groups': {
'generators': ['gen_mvar_'],
'substations': ['ss_mw_', 'ss_mvar_'],
'wind': ['wind_mw_'],
'shunts': ['shunt_mvar_', 'shunt_tap_'],
'voltages': ['gen_v_']
}
}
Customization Examples:
# Adjust power limits for different systems
DATA_VALIDATION['limit_checks']['power_limits']['wind']['max_mw'] = 200
DATA_VALIDATION['limit_checks']['power_limits']['substation']['max_mw'] = 500
# Enable more aggressive gap filling
DATA_VALIDATION['gap_filling']['max_gap_steps'] = 6
DATA_VALIDATION['gap_filling']['advanced_max_gap_steps'] = 24
Representative Operations Configuration
Clustering Parameters:
REPRESENTATIVE_OPS = {
'defaults': {
'k_max': 10, # Maximum clusters to test
'random_state': 42, # Reproducibility seed
'mapgl_belt_multiplier': 1.1, # MAPGL belt definition
'fallback_clusters': 2 # Fallback if no quality clusters
},
'quality_thresholds': {
'min_silhouette': 0.25, # Minimum clustering quality
'silhouette_excellent': 0.7, # Excellent quality threshold
'silhouette_good': 0.5, # Good quality threshold
},
'ranking_weights': {
'silhouette_weight': 1000, # Multi-objective ranking weights
'calinski_harabasz_weight': 1,
'davies_bouldin_weight': 10
},
'output_files': {
'representative_points': 'representative_operating_points.csv',
'clustering_summary': 'clustering_summary.txt'
}
}
Customization Examples:
# For larger power systems (more clusters needed)
REPRESENTATIVE_OPS['defaults']['k_max'] = 20
# For higher quality clustering requirements
REPRESENTATIVE_OPS['quality_thresholds']['min_silhouette'] = 0.4
# For different MAPGL belt definition
REPRESENTATIVE_OPS['defaults']['mapgl_belt_multiplier'] = 1.15
Enhanced Clustering Configuration
The enhanced clustering function extract_representative_ops_enhanced provides additional configuration options for advanced clustering techniques:
Enhanced Parameters:
# Enhanced clustering function parameters (not in REPRESENTATIVE_OPS)
extract_representative_ops_enhanced(
df,
max_power=850,
MAPGL=200,
output_dir='results',
# Enhanced preprocessing options
use_enhanced_preprocessing=True, # Enable advanced data preprocessing
outlier_threshold=2.5, # Standard deviations for outlier detection
correlation_threshold=0.95, # High correlation removal threshold
# Alternative algorithms
try_alternative_algorithms=True, # Test DBSCAN, Agglomerative, GMM
dbscan_eps=0.1, # DBSCAN neighborhood size
dbscan_min_samples=3, # DBSCAN minimum samples
# Dimensionality reduction
use_dimensionality_reduction=True, # Enable PCA preprocessing
pca_variance_threshold=0.95, # PCA variance to retain
# Feature engineering
engineer_features=True, # Create additional features
include_temporal_features=True, # Add hour/day/month features
include_cyclical_features=True, # Add sine/cosine temporal features
)
Feature Engineering Options:
The enhanced clustering automatically creates additional features when engineer_features=True:
Power Factors: Reactive to active power ratios for generators
Load Diversity: Standard deviation across substations
Wind Penetration: Wind generation as percentage of total load
Temporal Features: Hour of day, day of week, month (when enabled)
Cyclical Features: Sine/cosine transformations of temporal features
Algorithm Selection Strategy:
The enhanced method tests multiple algorithms in this order:
Standard K-means (baseline)
DBSCAN (density-based clustering)
Agglomerative Clustering (hierarchical)
Gaussian Mixture Models (probabilistic)
PCA + K-means (dimensionality reduction first)
The algorithm with the highest silhouette score is automatically selected.
Performance vs Quality Trade-off:
# Quick enhanced clustering (moderate improvement)
rep_df, diagnostics = extract_representative_ops_enhanced(
df, max_power=850, MAPGL=200,
use_enhanced_preprocessing=True,
try_alternative_algorithms=False, # Skip alternative algorithms
use_dimensionality_reduction=True
)
# Full enhanced clustering (maximum improvement, slower)
rep_df, diagnostics = extract_representative_ops_enhanced(
df, max_power=850, MAPGL=200,
use_enhanced_preprocessing=True,
try_alternative_algorithms=True, # Test all algorithms
use_dimensionality_reduction=True,
engineer_features=True # Full feature engineering
)
Visualization Configuration
Plot Style Settings:
PLOT_STYLE = 'seaborn-v0_8'
PLOT_PALETTE = 'husl'
Figure Sizes:
FIGURE_SIZES = {
'timeseries': (12, 8),
'daily_profile': (10, 6),
'monthly_profile': (10, 6),
'comprehensive': (15, 10),
'clustering': (16, 12)
}
Font Sizes:
FONT_SIZES = {
'title': 16,
'axis_label': 14,
'tick_label': 12,
'legend': 12,
'annotation': 10
}
Customization Examples:
# For different plot styles
PLOT_STYLE = 'default'
PLOT_PALETTE = 'viridis'
# For larger plots
FIGURE_SIZES['comprehensive'] = (20, 15)
FIGURE_SIZES['clustering'] = (24, 18)
# For different font sizes
FONT_SIZES['title'] = 18
FONT_SIZES['axis_label'] = 16