icollect

Collocations.icollect(start=None, end=None, files=None, **kwargs)

Load all files between two dates sorted by their starting time

Does the same as collect() but works as a generator. Instead of loading all files at the same time, it loads them in chunks (the chunk size is defined by max_workers). Hence, this method is less memory space consuming but slower than collect(). Simple hint: use this in for-loops but if you need all files at once, use collect() instead.

Parameters:
  • start – The same as in find().

  • end – The same as in find().

  • files – If you have already a list of files that you want to process, pass it here. The list can contain filenames or lists (bundles) of filenames. If this parameter is given, it is not allowed to set start and end then.

  • **kwargs – Additional keyword arguments that are allowed for imap(). Some might be overwritten by this method.

Yields:

A tuple of the FileInfo object of a file and its content. These tuples are yielded sorted by its file starting time.

Examples:

## Perfect for iterating over many files.
for content in fileset.icollect("2018-01-01", "2018-01-02"):
    # do something with file and content...

## If you want to have all files at once, do not use this:
data_list = list(fileset.icollect("2018-01-01", "2018-01-02"))

# This version is faster:
data_list = fileset.collect("2018-01-01", "2018-01-02")