Metadata-Version: 2.1
Name: audformat
Version: 0.15.0
Summary: Python implementation of audformat
Home-page: https://github.com/audeering/audformat/
Author: Johannes Wagner, Hagen Wierstorf, Baha Eddine Abrougui
Author-email: jwagner@audeering.com, hwierstorf@audeering.com, beddine@audeering.com
License: MIT
Project-URL: Documentation, https://audeering.github.io/audformat/
Keywords: audio,database,annotation
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.8
Requires-Dist: audeer (<2.0.0,>=1.18.0)
Requires-Dist: audiofile (>=0.4.0)
Requires-Dist: iso-639
Requires-Dist: iso3166
Requires-Dist: oyaml
Requires-Dist: pyyaml (>=5.4.1)
Requires-Dist: pandas (>=1.4.1)

=========
audformat
=========

|tests| |coverage| |docs| |python-versions| |license|

Specification and reference implementation of **audformat**.

audformat stores media data,
such as audio or video,
together with corresponding annotations
in a pre-defined way.
This makes it easy to combine or replace databases
in machine learning projects.

An audformat database is a folder
that contains media files
together with a header YAML file
and one or several files storing the annotations.
The database is represented as an ``audformat.Database`` object
and can be loaded with ``audformat.Database.load()``
or written to disk with ``audformat.Database.save()``.

Have a look at the installation_ and usage_ instructions
and the `format specifications`_ as a starting point.


.. _installation: https://audeering.github.io/audformat/install.html
.. _usage: https://audeering.github.io/audformat/create-database.html
.. _format specifications: https://audeering.github.io/audformat/data-introduction.html


.. badges images and links:
.. |tests| image:: https://github.com/audeering/audformat/workflows/Test/badge.svg
    :target: https://github.com/audeering/audformat/actions?query=workflow%3ATest
    :alt: Test status
.. |coverage| image:: https://codecov.io/gh/audeering/audformat/branch/master/graph/badge.svg?token=1FEG9P5XS0
    :target: https://codecov.io/gh/audeering/audformat/
    :alt: code coverage
.. |docs| image:: https://img.shields.io/pypi/v/audformat?label=docs
    :target: https://audeering.github.io/audformat/
    :alt: audformat's documentation
.. |license| image:: https://img.shields.io/badge/license-MIT-green.svg
    :target: https://github.com/audeering/audformat/blob/master/LICENSE
    :alt: audformat's MIT license
.. |python-versions| image:: https://img.shields.io/pypi/pyversions/audformat.svg
    :target: https://pypi.org/project/audformat/
    :alt: audformats's supported Python versions

Changelog
=========

All notable changes to this project will be documented in this file.

The format is based on `Keep a Changelog`_,
and this project adheres to `Semantic Versioning`_.


Version 0.15.0 (2022-08-05)
---------------------------

* Added: ``audformat.MiscTable``
  which can store data
  not associated with media files
* Added: store scheme labels in a misc table
* Added: dictionary ``audformat.Database.misc_tables``
  holding misc tables of a database
* Added: ``audformat.utils.difference()``
  for finding index entries
  that are only part of a single index
  for a given sequence of indices
* Added: ``audformat.utils.is_index_alike()``
  for checking if a sequence of indices
  has the same number of levels,
  level names,
  and matching dtypes
* Added: ``audformat.define.DataType.OBJECT``
* Added: ``audformat.utils.set_index_dtypes()``
  to change dtypes of an index
* Added: ``audformat.testing.add_misc_table()``
* Added: ``audformat.Database.__iter__``
  iterates through all (misc) tables,
  e.g. a user can do ``list(db)``
  to get a list of all (misc) tables
* Changed: ``audformat.Database.update()``
  can now join schemes
  with different labels
* Changed: ``audformat.utils.union()``,
  ``audformat.utils.intersect()``,
  and ``audformat.utils.concat()``
  now support any kind of index
* Changed: ``audformat.utils.intersect()``
  no longer removes segments
  from a segmented index
  that are contained
  in a filewise index
* Changed: require ``pandas>=1.4.1``
* Changed: use ``pandas`` dtype ``'string'``
  instead of ``'object'``
  for storing ``audformat`` dtype ``'str'`` entries
* Changed: use a misc table
  to store the ``'speaker'`` scheme labels
  in the emodb example
  in the documentation
* Changed: ``audformat.utils.join_labels()``
  raises ``ValueError``
  if labels are of different dtype
* Fixed: ensure column IDs are different from index level names
* Fixed: make sure
  ``audformat.Column.set()``
  converts data to dtype of scheme
  before checking if values are in min-max-range
  of scheme
* Fixed: links to ``pandas`` API in the documentation
* Fixed: include methods
  ``to_dict()``,
  ``from_dict()``,
  ``dump()``,
  and attributes
  ``description``,
  ``meta``
  in the documentation for the classes
  ``audformat.Column``,
  ``audformat.Database``,
  ``audformat.Media``,
  ``audformat.Rater``,
  ``audformat.Scheme``,
  ``audformat.Split``,
  ``audformat.Table``
* Fixed: type hint of argument ``dtype``
  in the documentation of ``audformat.Scheme``
* Removed: support for Python 3.7


Version 0.14.3 (2022-06-01)
---------------------------

* Added: ``audformat.utils.map_country()``
* Changed: improve speed of ``audformat.Table.drop_files()``
  for segmented tables


Version 0.14.2 (2022-04-29)
---------------------------

* Added: ``audformat.utils.index_has_overlap()``
* Added: ``audformat.utils.iter_index_by_file()``
* Changed: store categories with integers as ``int64`` instead of ``Int64``
* Changed: require ``audeer>=1.18.0``
* Changed: support ``pandas>=1.4.0``


Version 0.14.1 (2022-03-03)
---------------------------

* Added: ``audformat.utils.map_file_path()``


Version 0.14.0 (2022-02-24)
---------------------------

* Changed: ensure ``audformat.testing.create_database()``
  uses Unix path separators
* Changed: don't allow ``\`` path entries
  in a portable database
* Changed: mark deprecated ``root`` argument
  of ``audformat.testing.create_audio_files()``
  to be removed in version 1.0.0


Version 0.13.3 (2022-02-07)
---------------------------

* Fixed: conversion of pickle protocol 5 files
  to pickle protocol 4 in cache


Version 0.13.2 (2022-01-27)
---------------------------

* Fixed: reintroduce sorting the output of
  ``audformat.Database.files`` and
  ``audformat.Database.segments``


Version 0.13.1 (2022-01-26)
---------------------------

* Fixed: changelog for 0.13.0


Version 0.13.0 (2022-01-26)
---------------------------

* Changed: ``audformat.utils.union()`` no longer sorts levels
* Changed: ``audformat.Table.save()`` forces pickle format 4
* Changed: clean up test requirements
* Changed: require ``pandas < 1.4.0``


Version 0.12.4 (2022-01-12)
---------------------------

* Changed: the API documentation on the ``language`` argument
  of ``audformat.Database`` is more verbose now
* Changed: the difference between
  ``audformat.define.DataType.TIME``
  and ``audformat.define.DataType.DATE``
  is now discussed in the API documentation
* Fixed: saving a not loaded table to CSV
  when a PKL file is present
* Fixed: ``pandas`` deprecation warnings


Version 0.12.3 (2022-01-03)
---------------------------

* Removed: Python 3.6 support


Version 0.12.2 (2021-11-18)
---------------------------

* Added: ``audformat.assert_no_duplicates()``
* Changed: ``audformat.assert_index()`` no longer checks for duplicates


Version 0.12.1 (2021-11-17)
---------------------------

* Added: ``audformat.utils.hash()``
* Added: ``audformat.utils.expand_file_path()``
* Added: ``audformat.utils.replace_file_extension()``
* Changed: use ``yaml.CLoader`` for faster header reading


Version 0.12.0 (2021-11-10)
---------------------------

* Added: ``as_segmented``, ``allow_nat``, ``root``, ``num_workers``
  arguments to ``audformat.Table.get()``
* Added: ``as_segmented``, ``allow_nat``, ``root``, ``num_workers``
  arguments to ``audformat.Column.get()``
* Added: ``files_duration`` argument
  to ``audformat.utils.to_segmented_index()``
* Added: ``audformat.Database.files_duration()``
* Changed: changed default value of ``load_data`` argument
  in ``audformat.Database.load()`` to ``False``
* Changed: speed up ``audformat.Database.files``
  and ``audformat.Database.segments``
* Fixed: re-add support for ``pandas>=1.3``


Version 0.11.6 (2021-08-20)
---------------------------

* Added: support for Python 3.9
* Fixed: speed up ``audformat.utils.union()``
* Fixed: ``audformat.Column.set()`` with ``pd.Series``
  and ``np.array`` for a scheme with fixed labels
  and containing ``NaN`` values


Version 0.11.5 (2021-08-09)
---------------------------

* Removed: duration scheme and column
  from conventions
  and emodb example


Version 0.11.4 (2021-08-05)
---------------------------

* Added: custom ``BadKeyError`` when key is not found
* Changed: limit to ``pandas <1.3``
  until it works again for newer ``pandas`` versions
* Changed: remove the ``<1.0.0`` limit for ``audiofile``
  as a stable release is available and the API has not changed


Version 0.11.3 (2021-06-10)
---------------------------

* Added: ``audformat.utils.duration``
* Fixed: description of ``audformat.Database.is_portable``
  in documentation


Version 0.11.2 (2021-05-12)
---------------------------

* Added: ``audformat.utils.join_schemes``


Version 0.11.1 (2021-05-11)
---------------------------

* Added: ``Database.is_portable``
* Added: ``copy_media`` argument to ``Database.update()``
* Changed: remove ``root`` argument from ``testing.create_audio_files()`` and instead use ``Database.root``
* Fixed: ``utils.concat()`` converts to nullable dtype
* Fixed: ``utils.concat()`` returns ``DataFrame`` if input contains at least one ``DataFrame``


Version 0.11.0 (2021-05-06)
---------------------------

Note: tables stored from this version upwards cannot be loaded with older versions

* Added: ``Database.root``
* Added: ``utils.join_labels()``
* Added: ``Scheme.replace_labels()``
* Changed: set dependency to ``pandas>=1.1.5``
* Changed: do not compress pickled table files


Version 0.10.2 (2021-04-22)
---------------------------

* Changed: ``allow_nat`` argument to ``utils.to_segmented_index()``


Version 0.10.1 (2021-03-31)
---------------------------

* Fixed: ``audformat.assert_index()`` checks for correct dtypes


Version 0.10.0 (2021-03-18)
---------------------------

* Added: ``audformat.Database.update()``
* Added: ``audformat.Table.update()``
* Added: ``overwrite`` argument to ``audformat.utils.concat()``
* Changed: result of ``audformat.Table.__add__()`` is no longer assigned to a ``audformat.Database``


Version 0.9.8 (2021-02-23)
--------------------------

* Added: ``audformat.Database.license``
* Added: ``audformat.Database.license_url``
* Added: ``audformat.Database.author``
* Added: ``audformat.Database.organization``
* Added: ``audformat.utils.intersect()`` for index objects
* Added: ``audformat.utils.union()`` for index objects
* Changed: ``Database.load()`` raises error if table file missing
* Changed: forbid duplicates in ``audformat`` conform indices
* Fixed: ``audformat.Table.__add__()`` returned wrong values
  for some index combinations


Version 0.9.7 (2021-02-01)
--------------------------

* Added: ``update_other_formats`` argument to ``audformat.Table.save()``
  to make sure existing files in other formats are updated as well
* Changed: use ``round_trip`` argument when loading CSV files
  to ensure dataframes are equal after storing and loading again


Version 0.9.6 (2021-01-28)
--------------------------

* Fixed: implemented ``audformat.Database.__eq__`` and return ``True``
  for identical databases


Version 0.9.5 (2021-01-14)
--------------------------

* Changed: use nullable Pandas' type ``'boolean'`` for ``bool`` schemes
* Fixed: ``Scheme.draw()`` generates boolean values if scheme is ``bool``


Version 0.9.4 (2021-01-11)
--------------------------

* Changed: add arguments ``num_workers`` and ``verbose`` to
  ``audformat.Database.load()``


Version 0.9.3 (2021-01-07)
--------------------------

* Fixed: avoid sphinx syntax in CHANGELOG


Version 0.9.2 (2021-01-07)
--------------------------

* Changed: add arguments ``num_workers`` and ``verbose`` to
  ``audformat.Database.drop_files()``,
  ``audformat.Database.map_files()``,
  ``audformat.Database.pick_files()``,
  ``audformat.Database.save()``
* Changed: ``audformat.segmented_index()``
  support ``int`` and ``float``, which will be interpreted as seconds
* Fixed: ``audformat.utils.to_segmented_index()``
  returns correct index type for ``NaT``


Version 0.9.1 (2020-12-21)
--------------------------

* Fixed: add column name to HTML Series output in docs
* Fixed: removed mentioning of
  ``NotConformToUnifiedFormat`` error
  and ``RedundantArgumentError`` error
* Fixed: add missing errors to docstring
  of ``audformat.Table.set()``
  and ``audformat.Column.set()``


Version 0.9.0 (2020-12-18)
--------------------------

* Added: initial release public release


.. _Keep a Changelog:
    https://keepachangelog.com/en/1.0.0/
.. _Semantic Versioning:
    https://semver.org/spec/v2.0.0.html


