Draft: L2SS-2302: Draft for reading Parquet using generalized reader
90% of the reader code between HDF5 and Parquet can be shared only the specific lazily evaluating lambda's are probably different.
Done:
-
list, ChunkedArray access. (I believe ChunkedArray to already load across multiple row groups but verifying would be smart)
Todo:
-
ndarray -
object (should check against ChunkedArray maybe?, row length = 1 is possible but sorta circumvents the entire idea of this format) -
Top level metadata key:value -
Column metadata key:value -
Partitioned datasets: https://arrow.apache.org/docs/python/parquet.html#partitioned-datasets-multiple-files -
Cleanup, DRY or rather RUG, clean up the shared code between HDF5 and Parquet readers.
Closes L2SS-2302
Edited by Corné Lukken