classdef AssociatedDataset < HomemadeDataset % Defines data associated with CollocatedDataset % % The data stored with a CollocatedDataset is only the % very core needed to retrieve collocations. Any other data needs to be % stored by using one or more AssociatedDataset objects. % % Classes derived from AssociatedDataset describe datasets with data % associated with CollocatedDatasets. % The class AssociatedDataset itself is an abstract class, and can % therefore not be instantianed directly. However, it may be subclassed % for the implementation of an arbitrary AssociatedDataset. % Full implementations of AssociatedDataset that come with atmlab are % FieldCopier and Collapser. % % The properties and methods are documented here, because subclasses % provide mere implementations; the signature does not change. % % AssociatedDataset Properties: % % abstract properties: % % members - Describes how data are stored in NetCDF % parent - SatDataset that this AssociatedDataset belongs to % dependencies - Other AssociatedDataset that must be processed first % % (remaining properties from HomemadeDataset) % % AssociatedDataset Methods: (partial overview) % % Constructor: % % AssociatedDataset - Create AssociatedDataset object % % Abstract methods: % % primary_arguments - Get args for primary reader % secondary_arguments - Get args for secondary reader % needs_primary_data - Need data from primary? % needs_secondary_data - Need data from secondary? % process_granule - Process a single granule % % Implemented methods: % % process_delayed - Process data when core read from disk % merge - Combine core and associated % get_mergefields - Return fields needed to merge % concatenate - Vertically concatenate data % % (remaining methods from HomemadeDataset) % % See also: FieldCopier (implementation), Collapser (implementation), % HomemadeDataset (superclass), CollocatedDataset, SatDataset. % % $Id$ % need to know: % - additional arguments to reader primary % - additional arguments to reader secondary properties (Transient, Abstract, SetAccess = protected) % Describes how data are stored in NetCDF files % % This property gives a full description of how data are stored in % NetCDF files. The value of the property may be different for % different instances (objects) of any subclass, so the property % has no predefined value (unlike CollocatedDataset.members). %members; % Parent dataset that this AssociatedDataset is associated with % % Pointing to a CollocatedDataset, % this property describes what the AssociatedDataset relates to. parent; % Other AssociatedDatasets that need to be considered first. % % Contains a collection of other AssociatedDataset objects that % need to be considered first. For example, to process a % Collapser, one first needs a FieldCopier. % % See also method fields_needed_for_dependency. dependencies; end properties (Transient) % priority is dynamically set when sorting for dependencies priority = 0; end methods %% constructor function self = AssociatedDataset(varargin) % constructor for AssociatedDataset % % Note that AssociatedDataset can not be constructed, but some % sub-classes may be. In the examples below the class is called % AssociatedDataset, but replace this by whatever class you are % using to construct your object. % % FORMAT % % ad = AssociatedDataset(cd, ...) % % IN % % cd CollocatedDataset % % This argument is only present for dynamic subclasses, % such as FieldCopier. % Contains CollocatedDataset to which this % AssociatedDataset belongs. % % dp cell array % % This arguments is only present for dynamic subclasses. % Cell array of other AssociatedDataset objects on which % this AssociatedDataset depends, e.g., that have to be % calculated first. % % Remaining arguments passed on to parent. For static % subclasses, all arguments are directly passed on to the % parent. % % OUT % % AssociatedDataset-derived object. if nargin>0 && isa(varargin{1}, 'CollocatedDataset') % dynamic style style = 'dynamic'; cd = varargin{1}; dp = varargin{2}; [subargs{1:nargin-2}] = varargin{3:end}; else style = 'static'; [subargs{1:nargin}] = varargin{:}; end self = self@HomemadeDataset(subargs{:}); % call parent constructor if strcmp(style, 'dynamic') self.parent = cd; self.dependencies = dp; end if self.visible self.parent.add_associated(self); end end %% implement new methods function [M, M_cols] = merge_matrix(self, M_core, cols_core, M_self, cols_self) % horizontally combine core and associated % % Combine core data, core 'cols', associated data and % associated 'cols'. This may or may not be trivial depending % on the actual data. If more than two matrices need to be % merged, apply this method iteratively. % % FORMAT % % [M_new, M_cols] = ad.merge_matrix(M_core, cols_core, M_here, cols_here) % % IN % % M_core matrix selection from core collocations % cols_core structure structure describing M_core % M_here matrix selection of associated data % cols_here structure structure describing M_here % % OUT % % M_new matrix combination M_core, M_here % cols_new structure structure describing M_new M = [M_core M_self]; M_cols = self.merge_new_cols(M_core, cols_core, cols_self); end end methods (Access = {?SatDataset}) %% implement new methods function [out, localcols] = process_delayed(self, processed_core, spec1, spec2, varargin) % process associated data when core data is already there % % Sometimes, core collocations already exist, but one or more % associated datasets do not exist yet. This method, that is % not designed to be called directly by the end user, takes % care of this. % % This method just splits a day of collocations into segments % and passes each segment to process_granule. % That's where the actual processing is done, and for % process_granule there is no difference between processing % directly and processing later. % % FORMAT % % out = ad.process_delayed(processed_core, spec1, spec2[, depies]) % % IN % % processed_core matrix % % Matrix with processed core collocation data. The % columns are described by self.parent.cols. % % spec1 various sat (or so) for primary % spec2 various sat (or so) for secondary % depies cell array % % Contains output for all previous dependencies. % % depcols cell array % % Contains column-descriptions for depies % % fields cell array or 'all' % % Contains fields that are to be processed. % % OUT % % out matrix % % Data matrix with columns described by self.cols % % localcols structure, describes columns of 'out' [depies, depcols, fields] = optargs(varargin, {{}, {}, 'all'}); % data checks errid = ['atmlab:' mfilename ':InvalidFormat']; errmes = ['Data are not properly sorted: %s %s field %s has descending elements. ' ... 'One possible cause is that %s %s granule N is entirely contained ' ... 'in preceding granule N-1, but that granule N was not present ' ... 'when the firstline-db was generated for %s %s, although it was ' ... 'present when %s %s was generated. This means that granule N as collocated really ' ... 'is mostly duplicates of N-1, resulting in the secondary potentially going ' ... '''back in time'' for the collocations. This problem is detected ' ... 'if additionals are obtained seperately. The solution is ' ... 'to rerun find_granule_first_line for %s %s for today, and then ' ... 'redo collocations for %s %s for the entire day.']; assert(all(diff(processed_core(:, self.parent.cols.START1))>=0), ... errid, errmes, ... class(self.parent), self.parent.name, 'START1', ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent), self.parent.name, ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent), self.parent.name); assert(all(diff(processed_core(:, self.parent.cols.START2))>=0), ... errid, errmes, ... class(self.parent), self.parent.name, 'START2', ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent), self.parent.name, ... class(self.parent.primary), self.parent.primary.name, ... class(self.parent), self.parent.name); % divide in segments where new primary, new secondary starts [~, newprim] = unique(processed_core(:, self.parent.cols.START1), 'rows', 'first'); [~, newsec] = unique(processed_core(:, self.parent.cols.START2), 'rows', 'first'); % also add 'end' to it, because want to determine segments newseg = unique([newprim; newsec]); % empty data-structs are all I pass to processors not needing % data data1 = struct(); data2 = struct(); primseg = 0; seconseg = 0; out = []; logtext(atmlab('OUT'), 'Processing %d segments\n', length(newseg)); for segcount = 1:length(newseg) logtext(atmlab('OUT'), 'Processing segment %d/%d\n', segcount, length(newseg)); segstart = newseg(segcount); % end of segment: either beginning of next, or end of data if segcount < length(newseg) segend = newseg(segcount+1)-1; else segend = size(processed_core, 1); end % keep track of 'primary segment' and 'secondary segment' % to know corresponding date1, data1, etc. if primseg end end end function S = merge_struct(self, S_core, S_self) % merge structures as obtained from read_homemade_granule % % In most cases this is a simple structure-concatenation, but % for Collapsers and some other AssociatedDatasets it's more % involved. % Note: this uses undocumented behaviour status = warning('error', 'catstruct:DuplicatesFound'); S = catstruct(S_core, S_self); warning(status); end function C = get_mergefields(self) %#ok % Get minimum fields required to do merging % % In some cases, merge requires a certain minimum % of fields in order to perform the merging. This method % returns the minimum for a particular object (usually constant % per class). % % FORMAT % % C = ad.get_mergefields(); % % IN % % (none) % % OUT % % C cell array of strings names of needed fields % C = {}; end function new = concatenate(self, old_core_result, old_additional_result, new_additional_result) % to concatenate old and new data matrices % % To concatenate old and new data matrices, sometimes some % fields need to be corrected, otherwise this is trivial. % However, always use this method to concatenate data, just in % case date have to be corrected. An example where this is % necessary is for a Collapser, where FIRST and LAST need to be % corrected. % % FORMAT % % new = ad.concatenate(old_core, old_addi, new_addi) % % IN % % old_core matrix old core result % old_addi matrix old additional result % new_addi matrix new additional result % % OUT % % new matrix concatenated additional result if isempty(new_additional_result) new = old_additional_result; elseif isempty(old_additional_result) new = new_additional_result; else new = [old_additional_result; new_additional_result]; end end function members2cols(self) % converts self.members to corresponding self.cols % % Assumes sizes in self.members are correct. This may not % always be the case before the first data is read! % This has no input or output, because it operates entirely on % the own object. % % FORMAT % % ad.members2cols() % % IN % % (none, but uses members) % % OUT % % (none, but sets cols) allnames = fieldnames(self.members); tot = 1; for i = 1:length(allnames) fname = allnames{i}; fl = self.members.(fname); if isfield(fl, 'dims') no = self.members.(fname).dims{2}; else no = 1; end % self.cols is in HomemadeDataset self.cols.(fname) = tot:(tot+no-1); tot = tot + no; end end end methods (Static, Access = {?SatDataset}) function M_cols = merge_new_cols(M_core, cols_core, cols_self) % merge different cols-structures describing matrix of data % % This static method merges two cols-structure that describe a % matrix of data. % % FORMAT % % cols_new = merge_new_cols(M, cols_core, cols_self) % % IN % % M matrix original data % cols_core structure describing M % cols_self structure describing new % % OUT % % M_cols matrix new structure describing merged M_cols = catstruct(cols_core, ... structfun(@(x)x+size(M_core, 2), cols_self, 'UniformOutput', false)); end function do = redo_all(~) % redo_all(software_version) % % overload and return true if some changes require that a % dataset must be overwritten (overwrite=1) even if requested % to be appended (overwrite=2) do = false; end end % those methods must be implemented by subclasses methods (Abstract, Access = {?SatDataset}) % Arguments to pass on to primary reader % % This method returns a cell array with arguments that shall be % passed on to the primary reader. % % FORMAT % % args = ad.primary_arguments(fields) % % IN % % cell array of strings with fields or 'all' (default) % % OUT % % cell array with arguments passed on to primary reader. args = primary_arguments(self, varargin) % Arguments to pass on to secondary reader % % See primary_arguments. args = secondary_arguments(self, varargin) % Whether primary data is used at all. % % This method is used in 'late' processing, e.g. when the % collocations already exist, but associated data does not. For % late processing, not all original data may need to be re-read. % This method tells whether the primary data should be re-read. % % FORMAT % % reread = ad.needs_primary_data('all') % % IN % % optionally, cell array of strings with fields; defaults to % 'all' % % OUT % % logical scalar (boolean), true if data must be reread bool = needs_primary_data(self, varargin) % Whether secondary data is used at all % % See needs_primary_data. bool = needs_secondary_data(self, varargin) % Process a single granule % % This is the core method for any AssociatedDataset implementation. % It takes collocations as processed by a CollocatedDataset, % as well as original data from the primary and the secondary (if % so requested by needs_primary_data and needs_secondary_data). % It then does the necessary processing (such as copying in the % case of FieldCopier). % It must also set self.cols correctly. % % This method is not normally called directly by the user. However, % it is to me re-implemented in any special-purpose % AssociatedDataset. % % FORMAT % % out = ad.process_granule(processed_core, ... % data1, date1, spec1, ... % data2, date2, spec2, ... % dependencies) % % IN % % processed_core matrix % % matrix with one row for each collocation and columns % described by CollocatedDataset.cols. % This is the output of CollocatedDataset.process. % % data1 structure % % Full data for the primary, as output by the primary reader. % % date1 datevec date/time for primary granule % spec1 various specification (e.g. sat) for primary. % data2 like data1, but for secondary % date2 like date1, but for secondary % spec2 like spec1, but for secondary % % dependencies cell array % % Cell array with elements corresponding to % AssociatedDataset.dependencies. For each % dependency, this cell array contains an element with the % output of the process_granule method for that particular % AssociatedDataset. For example, for a Collapser % it will contain a single element with the output of % FieldCopier.process_granule. % % depcols cell array of structures describing columns of % dependencies % % fields cell array % % Fields to process. 'all' for all fields (only way until % recently) % % OUT % % data Matrix containing data % % localcols Cols describing columns. If all fields, this is % simply self.cols. [out, localcols] = process_granule(self, processed_core, data1, date1, spec1, data2, date2, spec2, dependencies, depcols, fields) %store(self, date, spec, result) fields = fields_needed_for_dependency(self, fields, dependency) end end