Draft: L2SS-2302: Draft for reading Parquet using generalized reader (!8) · Merge requests · LOFAR2.0 / lotus

90% of the reader code between HDF5 and Parquet can be shared only the specific lazily evaluating lambda's are probably different.

Done:

list, ChunkedArray access. (I believe ChunkedArray to already load across multiple row groups but verifying would be smart)

Todo:

ndarray
object (should check against ChunkedArray maybe?, row length = 1 is possible but sorta circumvents the entire idea of this format)
Top level metadata key:value
Column metadata key:value
Partitioned datasets: https://arrow.apache.org/docs/python/parquet.html#partitioned-datasets-multiple-files
Cleanup, DRY or rather RUG, clean up the shared code between HDF5 and Parquet readers.

Edited May 01, 2025 by Corné Lukken

Draft: L2SS-2302: Draft for reading Parquet using generalized reader