kaishi.tabular.filters.invalid_file_extensions
¶
Class definition for filtering invalid tabular file extensions.
Module Contents¶
-
kaishi.tabular.filters.invalid_file_extensions.
VALID_EXT
= ['.json', '.jsonl', '.json.gz', '.jsonl.gz', '.csv', '.csv.gz']¶
-
class
kaishi.tabular.filters.invalid_file_extensions.
FilterInvalidFileExtensions
¶ Bases:
kaishi.core.pipeline_component.PipelineComponent
Filter file list if non-tabular extensions exist.
-
__call__
(self, dataset)¶ Perform operation on a tabular dataset.
- Parameters
dataset (
kaishi.tabular.dataset.TabularDataset
) – dataset to perform file extension filter on
-
configure
(self, valid_extensions=VALID_EXT)¶ Configure the file extension filter (default list defined in VALID_EXT).
- Parameters
valid_extensions (list[str]) – list of file extensions that are valid (each must begin with “.”)
-