Datasets Guide¶
Overview¶
In this open-source anomaly detection benchmarking project, we only work with publicly available published dataset:
Data management toolstack¶
DuckDB: The data management is this project is implemented using DuckDB (Link: https://duckdb.org/). To access the dataset curated in this project, you would need to first install duckdb:
uv pip install duckdb
Git Large File Storage (LFS)¶
Because large files cannot be committed and versioned directly in Git repositories, we used Git Large File Storage (LFS) for versioning large dataset in this project (Link: https://git-lfs.com/). You can follow the documentation there to setup Git LFS for your local development.
Contributions¶
We welcome contributions to enhance our open-source anomaly detection benchmarking suite. New datasets, use-cases, or extensions to existing repositories are highly encouraged. If you are interested in contributing, feel free to reach out or submit a pull request.