classdef Collapser < AssociatedDataset
% collapse small footprint upon large footprint
%
% This class can be used for collocations where the primary has a
% significantly larger footprint than the secondary, such as between
% MHS and CloudSat or between MHS and AVHRR. It can be used e.g. for
% calculation of arbitrary statistics, such as mean, standard
% deviation, etc., or for selecting a single small-footprint for each
% large-footprint. Whereas the original collocations contain one entry
% for each of the smaller footprints --- thus repeating many times the
% entry for the larger footprint --- a dataset with this class contains
% at most one entry for each of the larger footprints. This is usually
% the desirable behaviour.
%
% The class is normally associated with an AssociatedDataset that works on
% the full-size --- often this is FieldCopier.
%
% For each primary footprint, the class copies data for the primary and
% gathers data for the secondary. The data for the secondary is then
% input to processing-functions, that can do arbitrary processing.
% For example, processing functions may calculate the mean, the standard
% deviation, the secondary closest to the primary, etc. Any
% function-handle can be passed. The class can either pass all secondary
% footprints, or a sub-set based on prescribed limitations (also with
% function-handles). If no secondary footprints meet the requirements,
% the primary footprint is not selected either. Limitations can be
% either global (applying to all processors) or local (applying to a
% single processor).
%
% Information on fields to be copied, fields to be processed, processor
% functions, local limitatiors, global limitations, and how to store
% fields, is all contained in a structure passed on to the constructor.
% See constructor documentation for details.
%
% Collapser Properties:
%
% fieldstruct - Structure with complete field information
% overall_limitators - Limitations applied to all fields
% (remaining properties inherited from AssociatedDataset.
% Use properties for a complete listing)
% vectorised - Determine if processors/limitators can be
% vectorised
%
% Collapser Methods:
%
% Collapser - Create Collapser object
% (remaining methods from AssociatedDataset)
%
% Example:
%
% >> mfcs.RO_ice_water_path.limitators = {@(X)(X>0)};
% >> mfcs.RO_ice_water_path.processors.MEAN = @mean;
% >> mfcs.RO_ice_water_path.stored.NO.type = 'single';
% >> global_lims = {@(X)(X(:, mhscpr.cols.DIST)<7.5), @(X)(abs(X(:, mhscpr.cols.INT))<600)};
% % f from FieldCopier example
% >> mfc = Collapser(f, mfcs, global_lims, ...
% 'name', 'MyLittleThing', 'basedir', '/some/path', ...
% 'subdir', '$YEAR4/$MONTH/$DAY', ...
% 'filename', 'mean_collocations_$SAT.nc.gz');
%
% See also: AssociatedDataset (abstract superclass), FieldCopier,
% CollocatedDataset, SatDataset, Collapser (constructor).
%
% Don't forget the Collocation User's Guide
% TODO:
% - store result of limitators as bitfield?
% - store limitators in NetCDF props
properties (Transient, SetAccess = protected)
%members = []; % See AssociatedDataset.members
parent = []; % See AssociatedDataset.parent
dependencies = {}; % See AssociatedDataset.dependencies
% Structure with all fields information
%
% The members of this fieldstruct must correspond to fieldnames
% contained by the parent dataset. The value of each entry
% in this struct is itself a structure, as follows:
%
% entry_name (structure with name corresponding to parent fieldname)
%
% limitators cell-array of function_handle
%
% Depending on the value of the collapser property
% 'vectorised' (defaults to false), this is either in
% non-vectorised or in vectorised form.
%
% In non-vectorised form, each limitator is called for every
% primary footprint. It receives as input the subset of secondary
% footprints that fall within the primary footprint. Each
% limitator must return a logical 1-D vector with the same
% length as the number of footprints in the input. Any later
% processing functions are applied only to those collocations
% for which ALL limitators return true.
%
% In vectorised form, each limitator is called once per
% granule. As an argument, it receives a p*N*q ndarray,
% where p is the maximum number of secondaries to ever
% occur inside a primary, N is the number of primaries, and q
% is the number of values per measurement for this field. Note that for each primary with
% s
Fieldcopier.
% By default, no attributes are stored and all
% fields are stored as single (float).
%
% incore logical (scalar boolean)
%
% If set to 'true', this field is not taken from the
% AssociatedDataset, but from the AssociatedDataset's parent,
% the CollocatedDataset or the core. This can be used to e.g.
% calculate an average position or to store a vector of
% line-numbers in the collapsed dataset.
%
% It is assumed that additional dimensions (no. channels, profile
% height, etc.) for the collapsed field corresponds to the one for
% the original field, which makes sense for mean, std, etc. It
% might not always be so; if it is not, one should explicitly set
% (field).stored.(processor).profile = false
%
% For examples, see the definitions in define_datasets and/or
% explore the various collapsed datasets defined in atmlab.
fieldstruct = [];
% Cell array of global limitators.
%
% These can work either in vectorised or non-vectorised form,
% depending on the value of the 'vectorised' property.
%
% In non-vectorised form (the default),
% Global limitators are applied to the subset of secondary
% footprints corresponding to a shared primary footprint before
% going into any field-specific processing. It is suitable to use
% for limiting distance or time for a certain averaging function.
% Each function handle receives as input a matrix
% of collocations, e.g. Nx14 when there are N
% secondary inside the primary, and shall
% return a logical of Nx1. All limitators are
% processed sequentially and are applied to
% all fields.
%
% In vectorised form, each global limitator is called exactly once
% for each granule. It receives a p*N*q ndarray, where p is the
% maximum number of secondaries ever inside a primary, N is the
% number of primaries for the granule, and q corresponds to
% fields (usually q=1). To conserve memory, global limitators in
% vectorised form must indicate what fields they shall be using, as
% a subset from fields in the core (CollocatedDataset.cols). Thus,
% rather than a simple cell array of function handles, it's a cell
% array of 2-element cell arrays, such as {{{'DIST'}, @(X)(X<7.5)}}.
% The function handles shall return a logical of
% size pxNx1 instructing what primaries to use, and what primaries to
% skip altogether.
overall_limitators = {}; % List of global limitators
end
properties (Transient)
% Determine if processors and limitators can be vectorised
%
% Collapsers can be very slow, because user-provided,
% Matlab-written functions are called for every primary footprint.
% For example, processing a single granule of MHS/AVHRR
% collocations can take more than an hour. In some - but not all -
% cases, limitators and processors can be vectorised, and
% processing speed can be increased by several orders of magnitude.
% This requires
vectorised = false;
end
% $Id$
methods
%% constructor
function self = Collapser(ad, fieldstruct, overall_limitators, varargin)
% constructor for Collapser class
%
% This constructs a Collapser
%
% FORMAT
%
% cl = Collapser(ad, fs, global_lims, ...)
%
% IN
%
% ad AssociatedDataset
%
% AssociatedDataset from which this Collapser should
% take its data.
%
% fs structure
%
% Structure giving full information on fields to process
% and how to process them. See property docs for details.
%
% global_lims cell array
%
% See property documentation.
%
% OUT
%
% Collapser object --- see class documentation.
%
% EXAMPLE
%
% See class docs.
cd = ad.parent;
dependency = ad;
% sort out which ones are just copied, which one processed
% smartly
self = self@AssociatedDataset(cd, {dependency}, varargin{:}); % call parent constructor
% make sure all members have at least 'processors',
% 'limiters', 'stored' and 'incore'; set appropiate
% values where they don't
fields = fieldnames(fieldstruct);
for i = 1:length(fields)
field = fields{i};
if ~isfield(fieldstruct.(field), 'processors')
error(['atmlab:' mfilename], 'No processors specified for %s', field);
end
if ~isfield(fieldstruct.(field), 'limitators');
fieldstruct.(field).limitators = {@(x)true(size(x, 1), 1)};
end
if ~isfield(fieldstruct.(field), 'incore')
fieldstruct.(field).incore = false;
end
if fieldstruct.(field).incore
fieldstruct.(field).origin = cd;
else
fieldstruct.(field).origin = ad;
end
% populate one or more 'members' for this field
procnames = fieldnames(fieldstruct.(field).processors);
for pi = 1:length(procnames)
procname = procnames{pi};
newfieldname = [upper(procname) '_' field];
newfield = fieldstruct.(field);
newfield = rmfield(newfield, 'processors');
newfield = rmfield(newfield, 'limitators');
newfield.orig_name = field;
if isfield(newfield.stored.(procname), 'type')
newfield.type = newfield.stored.(procname).type;
else
newfield.type = 'float';
end
if (isfield(fieldstruct, field) && ...
isfield(fieldstruct.(field), 'stored') && ...
isfield(fieldstruct.(field).stored, procname))
warning('off', 'catstruct:DuplicatesFound');
newfield = catstruct(newfield, fieldstruct.(field).stored.(procname));
end
self.members.(newfieldname) = newfield;
end
end
self.fieldstruct = fieldstruct;
self.overall_limitators = overall_limitators;
% add core members
self.members.FIRST.type = 'int';
self.members.FIRST.atts.long_name = 'First corresponding row in overlap';
self.members.LAST.type = 'int';
self.members.LAST.atts.long_name = 'Last corresponding row in overlap';
end
%% overload parent methods
function [M, M_cols] = merge_matrix(self, M_core, cols_core, M_self, cols_self)
M = [M_core(M_self(:, cols_self.FIRST), :) M_self];
M_cols = self.merge_new_cols(M_core, cols_core, cols_self);
end
end
methods (Access = {?SatDataset})
%% implementation abstract methods
function args = primary_arguments(~, ~)
args = {};
end
function args = secondary_arguments(~, ~)
args = {};
end
function bool = needs_primary_data(~, ~)
bool = false;
end
function bool = needs_secondary_data(~, ~)
bool = false;
end
function fields = fields_needed_for_dependency(self, fields, ~)
fields = unique(struct2cell(structfun(@(e) safegetfield(e, 'orig_name', ''), getfields(self.members, fields{:}), 'UniformOutput', false)));
fields = fields(~cellfun(@isempty, fields));
% FIXME: the above is insufficient in case the collapser is
% extended with fields from the core. Consider something like
% below, although this does not yet work. Comment out for now.
%{
fields_needed = {};
for i = 1:length(fields_considered)
field = fields_considered{i};
if isfield(self.members.(field), 'orig_name') % FIRST, LAST etc. don't have this
orig_name = self.members.(field).orig_name;
if ~isfield(self.members.(field), 'incore') || ~self.members.(field).incore
fields_needed = [fields_needed orig_name]; %#ok
end
end
end
fields_needed = unique(fields_needed);
%}
end
function [M, localcols] = process_granule(self, processed_core, ~, ~, ~, ~, ~, ~, deps, varargin)
[deps_cols, memnames] = optargs(varargin, {{self.dependencies{1}.cols}, 'all'});
M_data = deps{1};
clear deps; % otherwise duplicating potentially a lot of data
% deps_cols = cell2struct(deps_cols, cellfun(@(X) X.name, self.dependencies, 'UniformOutput', false), 2);
% deps_cols = deps_cols{1};
cols_data = deps_cols{1};
%%%
% Nomenclature note:
%
% - member is one geophysical quantity, e.g. IWP, LWP, etc.
% - field is a resulting collapsed quantity, e.g. MEAN_IWP,
% NO_LWP, etc.
% check that sizes are the same
assert(size(processed_core, 1)==size(M_data, 1), ...
['atmlab:' mfilename ':SizeError'], ...
['To average fields, core must have same no. of rows as associated. ' ...
'Core has ' num2str(size(processed_core, 1)) ' rows, ' ...
'associated has ' num2str(size(M_data, 1)) ' rows :(.']);
ad = self.dependencies{1};
cd = ad.parent;
% split by primary footprint and find first and last index
% corresponding to each footprint
[uniques, firsts] = unique(processed_core(:, [cd.cols.START1 cd.cols.LINE1 cd.cols.POS1]), 'rows', 'first');
lasts = [firsts(2:end)-1; size(processed_core, 1)];
% put sizes in self.members in order to get self.cols
if isequal(memnames, 'all')
memnames = fieldnames(self.members);
else
rqre_subset(memnames, fieldnames(self.members));
memnames = intersect(fieldnames(self.members), memnames);
memnames = union(memnames, {'FIRST', 'LAST'});
end
for mi = 1:length(memnames)
memname = memnames{mi};
% special cases 'FIRST' and 'LAST' don't have any
% corresponding data in the corresponding
% additional-dataset, so no copying is to be done either
if any(strcmp(memname, {'FIRST', 'LAST'}))
continue
end
mem = self.members.(memname);
origmem = mem.origin.members.(mem.orig_name);
if isfield(origmem, 'dims') && ~isfield(mem, 'dims')
mem.dims = origmem.dims;
end
% NB: commented out this 2013-03-25 as it caused flag
% CV_ROIWP.atts.missing_value to be reset.
% w = warning('off', 'catstruct:DuplicatesFound');
% self.members.(memname) = catstruct(...
% self.members.(memname).origin.members.(mem.orig_name), ...
% self.members.(memname));
% warning(w);
fields{mi} = self.members.(memname).orig_name;
end
self.members2cols();
fields = unique(fields(~cellfun(@isempty, fields)));
localcols = self.get_cols_from_bro(memnames, deps_cols);
% if isequal(fields, 'all');
% fields = fieldnames(self.fieldstruct);
% self.set_cols_from_bro();
% localcols = self.cols;
% else
% localcols = self.get_cols_from_bro(fields, deps_cols);
% end
nfields = max(cell2mat(struct2cell(localcols).'));
% this flag is only used until the end of the function. It signals
% that the collapser gave no data for a particular footprint, but
% only when this is due to the global limitator. This data is
% then removed, so this flag should never be exposed to the calling
% function.
%
% When the footprint itself is valid (i.e. global limitator returns
% some valid footprints) but the data are not (i.e. individual
% limitators return nothing or data are flagged to begin with), data
% are set to missing_value.
flag_notset = -realmax(); % hopefully, no real data has this value!
flag_globlimfailed = .9*-realmax();
flag_allflagged = +realmax();
M = flag_notset*ones(size(uniques, 1), nfields);
% cache some things outside the loops to speed it up a bit
% (done with profiler; code in this loop may be executed
% millions of types for processing a single day)
%fields = fieldnames(self.fieldstruct);
n_overall_limitators = length(self.overall_limitators);
has_overall_limitators = n_overall_limitators>0;
answers = cell(1, n_overall_limitators);
% field_has_limitators = cellfun(@(fname)~isempty(self.fieldstruct.(fname).limitators), fields);
% field_limitators = cellfun(@(fname) self.fieldstruct.(fname).limitators, fields, 'UniformOutput', false);
for k = 1:length(fields)
fname = fields{k};
procnames = fieldnames(self.fieldstruct.(fname).processors);
field_limitators{k} = self.fieldstruct.(fname).limitators;
field_has_limitators(k) = ~isempty(self.fieldstruct.(fname).limitators);
all_procnames{k} = procnames;
for pi = 1:length(procnames)
procname = procnames{pi};
proccers{k}{pi} = self.fieldstruct.(fname).processors.(procname);
end
end
n_orig = size(processed_core, 1);
if isempty(processed_core)
logtext(atmlab('OUT'), 'core seems empty, nothing to do!\n');
M = zeros(0, nfields);
elseif self.vectorised
% start with the easy bit
M(:, localcols.FIRST) = processed_core(firsts, cd.cols.INDEX);
M(:, localcols.LAST) = processed_core(lasts, cd.cols.INDEX);
%% Sophisticated vectorisation...
% create a matrix where each column contains the indices
% corresponding to one of the primaries; no. of rows
% corresponds to the largest occurding no. of secondaries in a
% primary, so most columns will have lots of data flagged
max_no_prim_in_sec = max(firsts(2:end)-firsts(1:end-1));
logtext(atmlab('OUT'), ...
['%s %s Commencing vectorised collapsing. ' ...
' %d primaries with up to %d secondaries per primary.\n'], ...
class(self), self.name, length(uniques), max_no_prim_in_sec);
% use a signed int for rarr; double is too large, single too
% imprecise, but I need negatives for flagged data
% NB: rarr stands for Re-ARRanged
rarr = bsxfun(@plus, vec2row(int32(firsts)), vec2col(int32(0:max_no_prim_in_sec-1)));
% flag data that shouldn't be there
rarrgt = bsxfun(@gt, rarr, vec2row(lasts));
rarr(rarrgt) = -1;
% now we have the array with indices in 'rarr'. Get
% corresponding data from data matrix, i.e. like this:
% 1 2 3 4 6 8 10 12 13 15
% -1 -1 -1 5 7 9 11 -1 14 16
% -1 -1 -1 -1 -1 -1 -1 -1 -1 17
% -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
% -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
% we will arrange two 3-D data matrices in this style: one
% for the core, one for the data. The core contains fields
% needed by global limitators and fields needed by
% processors. The data contains only fields needed by
% processors.
core_fields_needed_by_limitators = cellfun(@(X) X{1}, self.overall_limitators, 'UniformOutput', false);
proc_field_is_in_core = cellfun(@(X) self.fieldstruct.(X).incore, fields);
core_fields_needed_by_proccers = fields(proc_field_is_in_core);
core_fields_needed = union([core_fields_needed_by_limitators{:}], ...
core_fields_needed_by_proccers);
data_fields_needed = fields(~proc_field_is_in_core);
Mrar_core = nan(max_no_prim_in_sec, length(firsts), length(core_fields_needed), 'single');
Mrar_data = nan(max_no_prim_in_sec, length(firsts), sum(cellfun(@(f) length(cols_data.(f)), data_fields_needed)), 'single');
% we can afford to loop to max_no_prim_in_sec because it's only
% a few dozen at most. This loop takes 9.4 seconds for a 56 x
% 228391 x 15 array...
for i = 1:max_no_prim_in_sec
% copy over where 'rarr' is not flagged. FIXME: this
% should of course be properly selected
n = 1;
for k = 1:length(data_fields_needed)
field = data_fields_needed{k};
fieldsize = length(cols_data.(field));
Mrar_data(i, rarr(i, :)>0, n:(n+fieldsize-1)) = M_data(rarr(i, rarr(i, :)>0), cols_data.(field));
Mrar_data_cols.(field) = n:(n+fieldsize-1);
n = n + fieldsize;
end
for k = 1:length(core_fields_needed)
field = core_fields_needed{k};
Mrar_core(i, rarr(i, :)>0, k) = processed_core(rarr(i, rarr(i, :)>0), self.parent.cols.(field));
Mrar_core_cols.(field) = k;
end
% Mrar(i, rarr(i, :)>0, 1:5) = M_data(rarr(i, rarr(i, :)>0), cols_data.AVHRR_Y);
% Mrar(i, rarr(i, :)>0, 6) = M_data(rarr(i, rarr(i, :)>0), cols_data.AVHRR_FLAG_3AB);
%Mrar(i, rarr(i, :)>0, :) = processed_core(rarr(i, rarr(i, :)>0), :);
end
% now we have a p*N*q ndarray, with p the max. no. of
% secondary in primary, N the no. of primaries, and q the
% no. of fields. This replaces M_data that we clear for
% memory reasons.
clear M_data;
clear processed_core;
% go through the global limitators:
overall_global_lim = true(max_no_prim_in_sec, length(firsts));
for ii = 1:n_overall_limitators
lim_inp_fields = self.overall_limitators{ii}{1};
limitator = self.overall_limitators{ii}{2};
coreplanes = cell2mat(struct2cell(getfields(Mrar_core_cols, lim_inp_fields{:})));
this_global_lim = limitator(Mrar_core(:, :, coreplanes));
if ~isequal(size(this_global_lim), size(overall_global_lim))
error(['atmlab:' mfilename ':wrongsize'], ...
['This is %s %s speaking. I was processing vectorised ' ...
'data when I received a wrongly-sized logical from global ' ...
'limitator %s. I got: %s. But I wanted: %s. Giving up!'], ...
class(self), self.name, func2str(limitator), ...
num2str(size(this_global_lim)), num2str(size(overall_global_lim)));
end
overall_global_lim = overall_global_lim & this_global_lim;
end
% mask those data out of Mrar_data. overall_global_lim is
% just a logical so hopefully I can get away with repmat
overall_global_lim = repmat(overall_global_lim, [1, 1, size(Mrar_data, 3)]);
Mrar_data(~overall_global_lim) = NaN;
%clear overall_global_lim;
for k = 1:length(fields)
fname = fields{k};
if self.fieldstruct.(fname).incore
M_source = Mrar_core(:, :, Mrar_core_cols.(fname));
else
M_source = Mrar_data(:, :, Mrar_data_cols.(fname));
end
for pi = 1:length(all_procnames{k})
procname = all_procnames{k}{pi};
procfname = [procname '_' fname];
if ~isfield(localcols, procfname)
% probably processor is already there; in
% any case, it was not asked for
continue
end
capsed = proccers{k}{pi}(M_source);
expsize = [size(uniques, 1), length(localcols.(procfname))];
if ~isequal(size(capsed), expsize)
error(['atmlab:' mfilename ':wrongsize'], ...
['This is %s %s speaking. I was processing vectorised ' ...
'data when I received wrongly sized data from processor ' ...
'%s for field %s. I expected %s, but received %s. Giving up!'], ...
class(self), self.name, procname, fname, num2str(expsize), num2str(size(capsed)));
end
M(:, localcols.(procfname)) = capsed;
end
end
% for consistency with old method, remove completely
% instances where the overall global limitators prevented
% data
M(all(all(~overall_global_lim, 1), 3), :) = [];
self.cols = localcols;
else
%% proceed the old, slow way (OK for small data amounts)
% keep track of time for logging purposes (progress etc.)
t = 0;
logtext(atmlab('OUT'), 'Processing %d primary footprints, %d fields, %d total stats, %d values/collocation\n', ...
size(uniques, 1), length(fields), length(memnames), nfields);
j = 0; % counter increases only when there is data
for i = 1:size(uniques, 1) % need to be done in loop due to mean/std/etc.
tic;
first = firsts(i);
last = lasts(i);
M_coll_part = processed_core(first:last, :);
M_data_part = M_data(first:last, :);
j = j + 1;
% limitation to all, e.g. for distances
if ~has_overall_limitators;
lim_for_all_fields = true(size(M_data_part, 1), 1);
else
% 2nd version is much faster. This was still often the
% slowest part of the function, applying the overall
% limitators, for cases with few fields.
%answers = cellfun(@(X)(X(:)), cellfun(@(f)f(M_coll_part), self.overall_limitators, 'UniformOutput', false), 'UniformOutput', false);
for ii = 1:n_overall_limitators
answers{ii} = flat(self.overall_limitators{ii}(M_coll_part));
end
lim_for_all_fields = all(horzcat(answers{:}), 2);
if ~any(lim_for_all_fields)
M(j, :) = flag_globlimfailed;
continue
end
end
% M(j, self.cols.FIRST) = processed_core(first, cd.cols.INDEX);
% M(j, self.cols.LAST) = processed_core(last, cd.cols.INDEX);
M(j, localcols.FIRST) = processed_core(first, cd.cols.INDEX);
M(j, localcols.LAST) = processed_core(last, cd.cols.INDEX);
% for each field, apply limitations and call processing
% function on sub-limited set, if any are actually left
% also keep track if we had any valid fields here
anyvalidfields = false;
for k = 1:length(fields)
fname = fields{k};
if self.fieldstruct.(fname).incore
M_source = M_coll_part;
else
M_source = M_data_part;
end
% iteratively apply limitations, only if all
% limitations return true (including
% lim_for_all_fields) the collocation is selected for
% further average-processing
if ~field_has_limitators(k)
limsel = true(size(M_source, 1), 1);
else
limsel = lim_for_all_fields;
limmers = field_limitators{k};
for li = 1:length(limmers)
limmer = limmers{li};
% limhere = limmer(M_source(:, self.fieldstruct.(fname).origin.cols.(fname)));
limhere = limmer(M_source(:, cols_data.(fname)));
limsel = limsel(:) & limhere(:);
end
end
procnames = all_procnames{k};
if any(limsel)
% apply all processors and store in appropiate
% place
for pi = 1:length(procnames)
procname = procnames{pi};
if ~isfield(localcols, [procname '_' fname])
% probably processor is already there; in
% any case, it was not asked for
continue
end
%proccer = self.fieldstruct.(fname).processors.(procname);
% capsed = proccers{k}{pi}(M_source(limsel, self.fieldstruct.(fname).origin.cols.(fname)));
capsed = proccers{k}{pi}(M_source(limsel, cols_data.(fname)), M_coll_part(limsel, :));
%if ~isequal(size(capsed), [1 length(self.cols.([procname '_' fname]))])
if ~isequal(size(capsed), [1, length(localcols.([procname '_' fname]))])
error(['atmlab:' mfilename ':mismatch'], ...
['Error in collapsing %s %s %s. Processor ', ...
'returned wrongly-sized field. Got [%d, %d]. ', ...
'Expected [%d, %d].'], ...
self.name, fname, procname, size(capsed, 1), ...
size(capsed, 2), 1, length(localcols.([procname '_' fname])));
% size(capsed, 2), 1, length(self.cols.([procname '_' fname])));
end
M(j, localcols.([procname '_' fname])) = capsed;
% M(j, self.cols.([procname '_' fname])) = capsed;
anyvalidfields = true;
end
else
% set all processors for this field to missing_val
for pi = 1:length(procnames)
procname = procnames{pi};
if isfield(localcols, [procname '_' fname])
%M(j, self.cols.([procname '_' fname])) = self.fieldstruct.(fname).stored.(procname).atts.missing_value;
M(j, localcols.([procname '_' fname])) = self.fieldstruct.(fname).stored.(procname).atts.missing_value;
end
end
end
end
% remove if any are valid at all % (do you mean "if none are valid?")
% FIXME: make this optional
if ~anyvalidfields
M(j, :) = flag_allflagged;
end
t = t + toc;
if t > 600
logtext(atmlab('OUT'), 'Done %d/%d\n', ...
i, size(uniques, 1));
t = 0;
end
end
% remove remaining part of M, e.g. where I did too much
% pre-allocation or no relevant data was found
rest = all(M==flag_globlimfailed | M==flag_allflagged, 2);
% FIXME: what to do with the other flags?
M(rest, :) = [];
%%%
end
logtext(atmlab('OUT'), '%d collocations -> %d collapsed and valid collocations\n', ...
n_orig, size(M, 1));
end
%% overload parent methods
function S = merge_struct(varargin)
error(['atmlab:' mfilename ':notimplemented'], ...
'Collocating collapsed datasets is not implemented yet');
end
function C = get_mergefields(self) %#ok
C = {'FIRST'};
end
function new = concatenate(self, old_core_result, old_additional_result, new_additional_result)
if isempty(new_additional_result)
new = old_additional_result;
return
end
if ~isempty(old_core_result)
correction = old_core_result(end, self.parent.cols.INDEX);
new_first = new_additional_result(:, self.cols.FIRST) + correction;
new_last = new_additional_result(:, self.cols.LAST) + correction;
new_additional_result(:, self.cols.FIRST) = new_first;
new_additional_result(:, self.cols.LAST) = new_last;
end
if isempty(old_additional_result)
new = new_additional_result;
else
new = [old_additional_result; new_additional_result];
end
end
function [S, strattr] = read_homemade_granule(self, file, varargin)
info = self.find_info_from_granule(file);
dt = [str2double(info.year), str2double(info.month), str2double(info.day)];
extra_fields = optargs(varargin, {{}});
core_fields = {'LAT1', 'LON1', 'TIME1'};%, 'FIRST', 'LAST'};
all_fields = union(extra_fields, core_fields);
[M, c] = self.parent.read(dt, dt, info.satname, ...
all_fields, ...
struct(), ...
{}, ...
[self.dependencies, {self}]);
%self.parent.read_homemade_granule(self.parent.find_granule_by_datetime([2007, 1, 1], 'noaa18'), {})
if isempty(M)
S = cell2struct(repmat({[]}, [1, length(all_fields)]), all_fields, 2);
S.lat = [];
S.lon = [];
S.epoch = [];
S.time = [];
S.version = 0;
return
end
% need to call cast_fields_back for each dataset-type
% individually
[fields_core, additionals, additional_fields] = self.parent.deal_fields(all_fields, self.parent.associated);
S = self.parent.cast_fields_back(M, getfields(c, fields_core{:}));
for i = 1:length(additionals)
S = catstruct(S, ...
additionals{i}.cast_fields_back(M, ...
getfields(c, ...
additional_fields{i}{:})));
end
% from_parents = intersect(all_fields, fieldnames(self.parent.members));
% from_self = intersect(all_fields, fieldnames(self.members));
% S1 = self.parent.cast_fields_back(M, getfields(c, from_parents{:}));
% S2 = self.cast_fields_back(M, getfields(c, from_self{:}));
% S = catstruct(S1, S2);
% S = self.cast_fields_back(M, getfields(c, all_fields{:}));
S.lat = S.LAT1;
S.lon = S.LON1;
[year, month, day, hour, minute, second] = unixsecs2date(S.TIME1);
S.epoch = min(date2unixsecs(year, month, day));
S.time = S.TIME1 - S.epoch;
S.version = 0;
%error('Not implemented');
end
function set_cols_from_bro(self)
% Set cols-structure based on brothers cols
%
% When a collapsed dataset is processed, the AssociatedDataset
% it belongs to has always been processed already, so from its
% cols-structure can be inferred all the expected sizes. Use
% this to set or update the local cols-structure.
self.cols = self.get_cols_from_bro(self, fieldnames(self.members));
end
function cc = get_cols_from_bro(self, members, deps_cols)
% get cols-structure based on brothers cols
%
% When a collapsed dataset is processed, the AssociatedDataset
% it belongs to has always been processed already, so from its
% cols-structure can be inferred all the expected sizes. Use
% this to set or update the local cols-structure.
n = 1;
for i = 1:length(members)
field = members{i};
if isfield(self.members.(field), 'origin') && ...
~(isfield(self.members.(field), 'profile') && self.members.(field).profile==false)
% if it has an origin, it should have an orig_name
if isempty(fieldnames(self.members.(field).origin.cols))
ncols = length(deps_cols{...
strcmp(self.members.(field).origin.name, ...
cellfun(@(X) X.name, ...
self.dependencies, ...
'UniformOutput', false))}.(self.members.(field).orig_name));
else
ncols = length(self.members.(field).origin.cols.(self.members.(field).orig_name));
end
% FIXME/TODO: should reuse existing dimension names...
self.members.(field).dims = {sprintf('HEIGHT_%s', field), ncols};
else % should be FIRST, LAST... ncols==1
ncols = 1;
end
cc.(field) = n:(n+ncols-1);
n = n + ncols;
end
end
end
methods (Static, Access = {?SatDataset})
function do = redo_all(software_version)
% redo_all(software_version)
%
% overload and return true if some changes require that a
% dataset must be overwritten (overwrite=1) even if requested
% to be appended (overwrite=2)
vers = cellfun(@str2num, strsplit(software_version(8:end), '.'));
if length(vers) < 3 % shouldn't happen but redo
do = true;
return
end
major = vers(1);
minor = vers(2);
micro = vers(3);
if (major < 2) || ...
(major == 2 && minor < 1) || ...
(major == 2 && minor == 1 && micro < 337)
logtext(atmlab('OUT'), ...
'Found pre-fix collapsed granule, overwriting completely\n');
do = true;
else
do = false;
end
end
end
end