kaishi.core.file_group

Class definition for reading/writing files of various types.

Module Contents

class kaishi.core.file_group.FileGroup(recursive: bool)

Class for reading and performing general operations on groups of files.

__getitem__(self, key)

Get a specific file object.

load_dir(self, source: str, file_initializer, recursive: bool)

Read file names in a directory

Parameters
  • source (str) – Directory to load from

  • file_initializer (kaishi file initializer class (e.g. kaishi.core.file.File)) – Data file calss to initialize each file with

get_pipeline_options(self)

Returns available pipeline options for this dataset.

Returns

list of uninitialized pipeline component objects

Return type

list

configure_pipeline(self, choices: list = None, verbose: bool = False)

Configures the sequence of components in the data processing pipeline.

Parameters
  • choices (list) – list of pipeline choices

  • verbose (bool) – flag to indicate verbosity

file_report(self, max_file_entries=16, max_filter_entries=10)

Show a report of valid and invalid data.

Parameters
  • max_file_entries (int) – max number of entries to print of file list

  • max_filter_entries (int) – max number of entries to print per filter category (e.g. duplicates, similar, etc.)

run_pipeline(self, verbose: bool = False)

Run the pipeline as configured.

Parameters

verbose (bool) – flag to indicate verbosity