callmefair.mitigation.fair_log
Logging and Result Management for Bias Mitigation Experiments
This module provides comprehensive logging and result management capabilities for bias mitigation experiments. It includes CSV logging functionality and result aggregation tools for analyzing multiple experiments.
The module implements: - CSV-based experiment logging with automatic file management - Multiprocessing support for efficient result aggregation - Automatic cleanup of processed files - Comprehensive error handling and logging
- Classes:
csvLogger: CSV-based logger for experiment results
- Functions:
read_csv_file: Read a single CSV file with error handling aggregate_csv_files: Aggregate multiple CSV files using multiprocessing
Example
>>> from callmefair.mitigation.fair_log import csvLogger, aggregate_csv_files
>>>
>>> # Create logger for experiment
>>> logger = csvLogger('experiment_2024_01_15')
>>>
>>> # Log experiment results
>>> results = [
>>> {'model': 'RandomForest', 'BM': 'baseline', 'accuracy': 0.85},
>>> {'model': 'RandomForest', 'BM': 'reweighing', 'accuracy': 0.83}
>>> ]
>>> logger(results)
>>>
>>> # Aggregate results from multiple experiments
>>> aggregate_csv_files('./results/', './results/aggregated_results.csv')
Attributes
Classes
CSV-based logger for experiment results. |
Functions
|
Aggregate multiple CSV files from a folder into a single CSV file using multiprocessing. |
|
Read a single CSV file and return its DataFrame. |
Module Contents
- class callmefair.mitigation.fair_log.csvLogger(filename, path='results')[source]
CSV-based logger for experiment results.
This class provides a simple interface for logging experiment results to CSV files. It automatically creates the output directory if it doesn’t exist and appends results to the specified file.
- Variables:
- Parameters:
Example
>>> logger = csvLogger('experiment_results', path='./results/') >>> >>> # Log a single result >>> result = {'model': 'RandomForest', 'accuracy': 0.85} >>> logger([result]) >>> >>> # Log multiple results >>> results = [ >>> {'model': 'RandomForest', 'accuracy': 0.85}, >>> {'model': 'LogisticRegression', 'accuracy': 0.82} >>> ] >>> logger(results)
Initialize the CSV logger.
- Parameters:
Example
>>> logger = csvLogger('experiment_2024_01_15', path='./experiments/')
- __call__(named_dict)[source]
Log experiment results to CSV file.
This method takes a list of dictionaries (each representing one experiment result) and appends them to the CSV file. The method automatically handles DataFrame conversion and CSV writing.
- Parameters:
named_dict (list[dict]) – List of dictionaries containing experiment results. Each dictionary should have consistent keys across all entries.
- Return type:
None
Example
>>> logger = csvLogger('experiment_results') >>> >>> # Log single result >>> result = {'model': 'RandomForest', 'accuracy': 0.85, 'fairness': 0.92} >>> logger([result]) >>> >>> # Log multiple results >>> results = [ >>> {'model': 'RandomForest', 'accuracy': 0.85, 'fairness': 0.92}, >>> {'model': 'LogisticRegression', 'accuracy': 0.82, 'fairness': 0.89} >>> ] >>> logger(results)
- __check_path__()[source]
Check and create the output directory if it doesn’t exist.
This method ensures that the output directory exists before attempting to write CSV files. If the directory doesn’t exist, it creates it.
Example
>>> logger = csvLogger('test', path='./new_directory/') >>> # Directory './new_directory/' is automatically created
- Return type:
None
- callmefair.mitigation.fair_log.aggregate_csv_files(folder_path, output_file='aggregated_data.csv', num_processes=10)[source]
Aggregate multiple CSV files from a folder into a single CSV file using multiprocessing.
This function efficiently combines multiple CSV files into a single file for analysis. It uses multiprocessing for improved performance on large datasets and includes comprehensive error handling and logging.
The function: 1. Finds all CSV files in the specified folder 2. Reads them in parallel using multiprocessing 3. Combines all DataFrames into a single DataFrame 4. Saves the aggregated data to the output file 5. Optionally deletes the original files after successful aggregation
- Parameters:
- Raises:
Exception – If aggregation fails due to file system or processing errors
- Return type:
None
Example
>>> # Aggregate all CSV files in the results folder >>> aggregate_csv_files( >>> folder_path='./results/', >>> output_file='./results/aggregated_results.csv', >>> num_processes=8 >>> ) >>> >>> # Use all available CPU cores >>> aggregate_csv_files( >>> folder_path='./experiments/', >>> output_file='./experiments/all_results.csv', >>> num_processes=None >>> )
- callmefair.mitigation.fair_log.read_csv_file(file_path)[source]
Read a single CSV file and return its DataFrame.
This function provides a robust way to read CSV files with comprehensive error handling. It’s designed to work with the multiprocessing aggregation functionality.
- Parameters:
file_path (str) – Path to the CSV file to read
- Returns:
- DataFrame containing the CSV data. Returns empty DataFrame
if reading fails.
- Return type:
pd.DataFrame
Example
>>> df = read_csv_file('./results/experiment_1.csv') >>> print(f"Loaded {len(df)} rows from CSV file")