kaishi.core.misc

Miscellaneous helper functions.

Module Contents

kaishi.core.misc.load_files_by_walk(dir_name_raw: str, file_initializer, recursive: bool = False)

Load files from a directory with an option to recurse.

Parameters
  • dir_name_raw (str) – Directory to load file structure from

  • file_initializer (kaishi file initializer class (e.g. kaishi.core.file.File)) – Data file class to initialize each file with

  • recursive (bool) – Option to load recursively, defaults to False

Returns

canonical directory name, list of subdirectories, and list of initialized files

Return type

str, list, and list

kaishi.core.misc.trim_list_by_inds(list_to_trim: list, indices: list)

Trim a list given an unordered list of indices.

Parameters
  • list_to_trim (list) – list to remove elements from

  • indices (list) – indices of list items to remove

Returns

new list, trimmed items

Return type

list, list

kaishi.core.misc.find_duplicate_inds(list_with_duplicates: list)

Find indices of duplicates in a list.

Parameters

list_with_duplicates (list) – list containing duplicate items

Returns

list of duplicate indices, list of unique items (parents of duplicates)

Return type

list and list

kaishi.core.misc.find_similar_by_value(list_of_values: list, difference_threshold)

Find near duplicates based on similar reference value.

Parameters
  • list_of_values (list) – list of values to compare

  • difference_threshold – differences above this threshold will be identified for removal

Returns

list of similar indices, list of unique items (parents of similar items)

Return type

list and list

kaishi.core.misc.md5sum(filename: str)

Compute the md5sum of a file.

Parameters

filename (str) – name of file to compute hash of

Returns

hash value

class kaishi.core.misc.CollapseChildren

Bases: kaishi.core.pipeline_component.PipelineComponent

Restructure potentially multi-layer file tree into a single parent/child layer.

__call__(self, dataset)
kaishi.core.misc.is_valid_label(label_str: str, label_enum)

Check if a label is contained in an enum.

Parameters

label_str (str) – string defining the label

Returns

flag indicating if label is valid

Return type

bool