AudioCompass is a comprehensive evaluation framework for audio and multimodal language models. This platform allows researchers and developers to benchmark various voice assistant models using standardized datasets and metrics.
- Overview
- Installation
- Data Preparation
- Model Preparation
- Usage and Evaluation
- Adding New Models
- Adding New Benchmarks
AudioCompass provides a unified interface to evaluate the capabilities of various voice assistant models. The framework supports:
- Audio-only input processing
- Text-only input processing
- Mixed audio and text inputs
- Integration with various benchmarks and evaluation metrics
# Clone the repository
git clone https://github.com/VoiceAgentGroup/AudioCompass.git
cd AudioCompass
# Set up the environment
conda create -n audiocompass python=3.12 -y
conda activate audiocompass
pip install -r requirements.txtTo run a benchmark on a specific model:
python main.py --model-name <model_name> --benchmark <dataset_name> --subset <subset_name> --split <split_name> --output-dir <output_directory> --cache-dir <cache-directory>Add the --offline flag to run in environments without internet access, assuming models and datasets are already cached:
For example:
python main.py --model-name speechgpt2 --benchmark voicebench --subset alpacaeval --split test --output-dir output --cache-dir cache --offlineTo list available models and benchmarks:
from src.models import list_models
print(list_models())
from src.benchmarks import list_benchmarks
print(list_benchmarks())- Benchmark datasets are expected to be located within a
datassubdirectory inside the specified cache directory (--cache-dir, defaults to./cache). - Ensure the cache directory exists (e.g., create
./cache/datas). - For datasets requiring manual download, place the datasets into
<cache_dir>/datas/:- OpenAudioBench
- VoxEval.
- seed-tts-eval
- storycloze:
Example cache directory structure for data:
<cache_dir>/
└── datas/
├── OpenAudioBench/
├── VoxEval/
├── seedtts_testset/
├── storycloze/
│ ├── sSC/
│ └── tSC/
└── ... (other datasets)
- Models are typically downloaded and cached automatically into a
modelssubdirectory within the specified cache directory (--cache-dir, defaults to./cache). - Ensure the cache directory exists (e.g., create
./cache/models). - For models requiring manual download, place the model files within
<cache_dir>/models/:
Example cache directory structure for models:
<cache_dir>/
└── models/
├── WavLM-large-finetuned/
│ └── ... (model files)
└── ... (other models downloaded automatically or manually)
To add a new model to AudioCompass, follow these steps:
- Create a new Python file in the
src/modelsdirectory, e.g.,src/models/newmodel.py - Implement a class that inherits from the
VoiceAssistantbase class:
# src/models/newmodel.py
from .base import VoiceAssistant
class NewModelAssistant(VoiceAssistant):
def __init__(self):
# Initialize your model here
pass
def generate_a2t(self, audio, max_new_tokens=2048):
pass
def generate_t2t(self, text):
pass
def generate_at2t(self, audio, text, max_new_tokens=2048):
pass
# And other necessary methods for your model- Update the
src/models/__init__.pyfile to import and register your new model:
# In src/models/__init__.py
from .newmodel import NewModelAssistant
# Add to the model_cls_mapping dictionary
model_cls_mapping = {
# existing models...
'new_model_name': ('.newmodel', 'NewModelAssistant'),
}To add a new benchmark to AudioCompass, follow these steps:
- Create a new Python file in the
src/benchmarksdirectory, e.g.,src/benchmarks/newbenchmark.py - Implement a benchmark class that inherits from the
BaseBenchmarkclass:
# src/benchmarks/newbenchmark/newbenchmark.py
from ..base import BaseBenchmark
class NewBenchmark(BaseBenchmark):
def __init__(self, subset_name, split):
self.name = 'newbenchmark'
self.subset_name = subset_name # if applicable
self.split = split
self.dataset = self.load_data()
def load_data(self):
# Load and preprocess your dataset here
return dataset
def generate(self, model):
# Generate responses using the model
return results
def evaluate(self, data):
# Implement evaluation metrics
return evaluated_results
def save_generated_results(self, results, output_dir, model_name):
# Save generation results to the output directory
pass
def run(self, model, output_dir):
generated_results = self.generate(model)
self.save_generated_results(generated_results, output_dir, model.__class__.__name__)
return self.evaluate(generated_results)- Update the
src/benchmarks/__init__.pyfile to import and register your new benchmark:
# In src/benchmarks/__init__.py
from .newbenchmark.newbenchmark import NewBenchmark
benchmark_mapping = {
# existing benchmarks...
'new_benchmark_name': ('.newbenchmark', 'NewBenchmark'),
}