kaishi.tabular.filters.invalid_file_extensions

Class definition for filtering invalid tabular file extensions.

Module Contents

kaishi.tabular.filters.invalid_file_extensions.VALID_EXT = ['.json', '.jsonl', '.json.gz', '.jsonl.gz', '.csv', '.csv.gz']
class kaishi.tabular.filters.invalid_file_extensions.FilterInvalidFileExtensions

Bases: kaishi.core.pipeline_component.PipelineComponent

Filter file list if non-tabular extensions exist.

__call__(self, dataset)

Perform operation on a tabular dataset.

Parameters

dataset (kaishi.tabular.dataset.TabularDataset) – dataset to perform file extension filter on

configure(self, valid_extensions=VALID_EXT)

Configure the file extension filter (default list defined in VALID_EXT).

Parameters

valid_extensions (list[str]) – list of file extensions that are valid (each must begin with “.”)