Utility functions#

Please be noted that utility functions are intended to be used within the BloArk project. It is NOT intended to be used by the end user, and it is NOT a part of the public API.

This page is designed to help developers understand the internal dependencies of the BloArk project, and encourage contributors to reuse the existing utility functions instead of writing their own.

Note

Please note that all unstable utility functions that are marked with @unstable in code will also be marked as Unstable in this page.

bloark.utils.cleanup_dir(path: str, onerror: ~typing.Callable | None = <function _rmtree_error_handler>)#

Clean up the directory.

Parameters:
  • path (str) – The path to the directory.

  • onerror (Union[Callable, None]) – The error handler.

bloark.utils.compress_zstd(input_path: str, output_path: str)#

Compress the blocks into a Zstandard file.

Parameters:
  • input_path (str) – The input path.

  • output_path (str) – The output path.

bloark.utils.compute_total_available_space(output_dir: str) int#

Deprecated since version 0.7.1: No longer needed.

Compute the total available space in the output directory.

Parameters:

output_dir (str) – The output directory.

Returns:

The total available space in bytes.

Return type:

int

bloark.utils.decompress_zstd(input_path: str, output_path: str)#

Decompress the blocks from a Zstandard file.

Parameters:
  • input_path (str) – The input path.

  • output_path (str) – The output path.

bloark.utils.get_curr_version()#

Get the version of the package.

Returns:

version – The version of the package.

Return type:

str

bloark.utils.get_decompress_output_path(input_path: str, output_dir: str)#

Get the output path of the decompressed file.

Parameters:
  • input_path (str) – The input path.

  • output_dir (str) – The output directory.

Returns:

The output path.

Return type:

str

bloark.utils.get_estimated_size(path: str) int#

Get the estimated size of the file.

Parameters:

path (str) – The path to the file.

Returns:

The estimated size of the file.

Return type:

int

bloark.utils.get_file_list(input_path: str, extensions: List[str] = None) List[str]#

Get the list of files in the input directory.

Parameters:
  • input_path (str) – The input directory.

  • extensions (List[str]) – The list of extensions to consider.

Raises:

FileNotFoundError – If the path does not exist.

Returns:

The list of files.

Return type:

List[str]

bloark.utils.get_line_positions(path: str) List[int]#

Get all line positions in the given file. So that it could be re-used to read the file for a specific line.

Parameters:

path (str) – The path to the file.

Returns:

The list of line positions.

Return type:

List[int]

bloark.utils.get_memory_consumption() int#

Get the memory consumption of the current process.

Returns:

The memory consumption in MB.

Return type:

int

bloark.utils.parse_schema(obj)#

Parse the schema of the given object. Used for glimpse.

Parameters:

obj (Any) – The object to parse.

bloark.utils.prepare_output_dir(output_dir: str)#

Prepare the output directory.

Parameters:

output_dir (str) – The output directory.

bloark.utils.read_line_in_file(path: str, position: int) str#

Read a specific line in the file without loading the entire file into memory.

Parameters:
  • path (str) – The path to the file.

  • position (int) – The start position of the line.

Returns:

The line itself in string.

Return type:

str