Skip to content

Draft: L2SS-2302: Draft for reading Parquet using generalized reader

90% of the reader code between HDF5 and Parquet can be shared only the specific lazily evaluating lambda's are probably different.

Done:

  1. list, ChunkedArray access. (I believe ChunkedArray to already load across multiple row groups but verifying would be smart)

Todo:

  1. ndarray
  2. object (should check against ChunkedArray maybe?, row length = 1 is possible but sorta circumvents the entire idea of this format)
  3. Top level metadata key:value
  4. Column metadata key:value
  5. Partitioned datasets: https://arrow.apache.org/docs/python/parquet.html#partitioned-datasets-multiple-files
  6. Cleanup, DRY or rather RUG, clean up the shared code between HDF5 and Parquet readers.

Closes L2SS-2302

Edited by Corné Lukken

Merge request reports

Loading