cpc.geofiles.loading Module
Contains methods for loading larger amounts of data than a single day
For example, let’s say you want to load all of the forecasts valid today’s month and day from all years between 1985 and 2010. This module is intended to make that much simpler.
Functions
all_int_to_str(input)
load_climos(valid_days, file_template, geogrid, num_ptiles=None, debug=False)
Loads climatology data
Data is loaded for a given range of days of the year. Currently the data must be in binary format with the dimensions (ptiles x grid points) when num_ptiles is an integer, and (grid points [1-d]) when num_ptiles is None
- {mm}
- {dd}
Within a loop over the days, the bracketed variables above are replaced with the appropriate value.
Parameters
- valid_days (list of strings): list of days of the year to load - must be formatted as MMDD (eg. [0501, 0502, 0503, 0504, 0505])
- file_template (string): file template used to construct file names for each date, fhr and member
- geogrid (Geogrid): Geogrid associated with the data
- num_ptiles (int or None): number of percentiles expected in the data file - if None then the file is assumed to be a mean or standard deviation instead of percentiles (default: None)
- debug (boolean): if True the file data is loaded from will be printed out (default: False)
Returns
- Climatology object containing the observation data and some QC data
Examples
Load a few days of climatology data
>>> from cpc.geogrids import Geogrid >>> from cpc.geofiles.loading import load_climos >>> valid_days = ['0101', '0102', '0103'] >>> file_template = '/path/to/files/tmean_clim_poe_05d_{mm}{dd}.bin' >>> geogrid = Geogrid('1deg-global') >>> num_ptiles = 19 >>> dataset = load_climos(valid_days, file_template, geogrid, ... num_ptiles=num_ptiles, debug=True) >>> print(dataset.climo.shape) (3, 19, 65160) >>> print(dataset.climo[:, :, 0]) [[ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan] [ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan] [ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]]
load_dtrm_fcsts(issued_dates, fhrs, file_template, data_type, geogrid, fhr_stat=’mean’, yrev=False, grib_var=None, grib_level=None, remove_dup_grib_fhrs=False, debug=False)
Loads deterministic forecast data
Data is loaded for a given list of dates and forecast hours. The file template can contain any of the following bracketed variables:
- {yyyy}
- {mm}
- {dd}
- {cc}
- {fhr}
Within a loop over the dates and fhrs, the bracketed variables above are replaced with the appropriate value.
Parameters
- issued_dates (list of strings): list of issued dates in YYYYMMDD or YYYYMMDDCC format - if YYYYMMDD, the cycle is assumed to be 00
- fhrs (list of numbers or strings): list of fhrs to load
- file_template (string): file template used to construct file names for each date, fhr and member
- data_type (string): data type (bin, grib1 or grib2)
- geogrid (Geogrid): Geogrid associated with the data
- fhr_stat (string): statistic to calculate over the forecast hour dimension (mean [default] or sum)
- yrev (boolean): whether fcst data is reversed in the y-direction, and should be flipped when loaded (default: False)
- grib_var (string): grib variable name (for grib files only)
- grib_level (string): grib level name (for grib files only)
- remove_dup_grib_fhrs (boolean): whether to remove potential duplicate fhrs from the grib files (default: False) - sets the
grep_fhrparameter to the current fhr when callingread_grib(), which greps for the fhr in the given grib file - this is useful for gribs that may for some reason have duplicate records for a given variable but with different fhrs. This way you can get the record for the correct fhr.- debug (boolean): if True the file data is loaded from will be printed out (default: False)
Returns
- DeterministicForecast object containing the forecast data and some QC data
Examples
Load a few days of deterministic forecast data
>>> from cpc.geogrids import Geogrid >>> from cpc.geofiles.loading import load_dtrm_fcsts >>> valid_dates = ['20160101', '20160102', '20160103'] >>> fhrs = range(0, 120, 6) >>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/{cc}/' 'gfs_{yyyy}{mm}{dd}_{cc}z_f{fhr}.grb2' >>> data_type = 'grib2' >>> geogrid = Geogrid('0.5-deg-global-center-aligned') >>> grib_var = 'TMP' >>> grib_level = '2 m above ground' >>> dataset = load_dtrm_fcsts(valid_dates, fhrs, file_template, ... data_type, geogrid, grib_var=grib_var, ... grib_level=grib_level) >>> print(dataset.fcst.shape, dataset.fcst[:, 0]) # doctest: +SKIP (3, 259920) [ 246.64699936 246.50599976 245.97450104]
load_ens_fcsts(issued_dates, fhrs, members, file_template, data_type, geogrid, fhr_stat=’mean’, yrev=False, grib_var=None, grib_level=None, remove_dup_grib_fhrs=False, debug=False)
Loads ensemble forecast data
Data is loaded for a given list of dates, forecast hours and members. The file template can contain any of the following bracketed variables:
- {yyyy}
- {mm}
- {dd}
- {cc}
- {fhr}
- {member}
Within a loop over the dates, fhrs and members, the bracketed variables above are replaced with the appropriate value.
Parameters
- issued_dates (list of strings): list of issued dates in YYYYMMDD or YYYYMMDDCC format - if YYYYMMDD, the cycle is assumed to be 00
- fhrs (list of numbers or strings): list of fhrs to load
- members (list of numbers or strings): list of members to load
- file_template (string): file template used to construct file names for each date, fhr and member
- data_type (string): data type (bin, grib1 or grib2)
- geogrid (Geogrid): Geogrid associated with the data
- fhr_stat (string): statistic to calculate over the forecast hour dimension (mean [default] or sum)
- yrev (boolean): whether fcst data is reversed in the y-direction, and should be flipped when loaded (default: False)
- grib_var (string): grib variable name (for grib files only)
- grib_level (string): grib level name (for grib files only)
- remove_dup_grib_fhrs (boolean): whether to remove potential duplicate fhrs from the grib files (default: False) - sets the
grep_fhrparameter to the current fhr when callingread_grib(), which greps for the fhr in the given grib file - this is useful for gribs that may for some reason have duplicate records for a given variable but with different fhrs. This way you can get the record for the correct fhr.- debug (boolean): if True the file data is loaded from will be printed out (default: False)
Returns
- EnsembleForecast object containing the forecast data and some QC data
Examples
Load a few days of ensemble forecast data
>>> from cpc.geogrids import Geogrid >>> from cpc.geofiles.loading import load_ens_fcsts >>> valid_dates = ['20160101', '20160102', '20160103'] >>> fhrs = range(0, 120, 6) >>> members = range(0, 21) >>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/{cc}/' 'gefs_{yyyy}{mm}{dd}_{cc}z_f{fhr}_m{member}.grb2' >>> data_type = 'grib2' >>> geogrid = Geogrid('1deg-global') >>> grib_var = 'TMP' >>> grib_level = '2 m above ground' >>> dataset = load_ens_fcsts(valid_dates, fhrs, members, file_template, ... data_type, geogrid, grib_var=grib_var, ... grib_level=grib_level) >>> print(dataset.ens.shape) (3, 21, 65160) >>> print(dataset.ens[:, :, 0]) # doctest: +SKIP [[ 246.18849945 246.40299683 247.11050034 245.95850067 246.17949905 246.91550064 247.41700134 246.53700104 247.96300125 246.05699921 246.08150101 247.11800003 247.46500015 247.30050049 247.44899979 245.84649963 247.8234993 246.21900101 246.45600128 245.72950058 246.05299988] [ 246.11650085 245.45250092 247.54049759 246.35499878 245.56750107 246.74899902 247.23949966 246.52750015 247.40500031 245.96500092 245.85749969 246.07099915 247.3465004 246.61099854 245.78749771 247.18349838 246.47999954 245.44049988 245.78899994 245.67700043 245.87299957] [ 245.88300095 245.5995018 247.63799896 247.21050034 245.88849945 246.78749847 246.15800018 246.15749969 246.41600113 246.00299988 246.80950012 246.51200104 247.11650009 246.2659996 245.96800156 247.20250168 246.22499924 245.72900162 245.85200043 244.81850128 245.73949966]] >>> print(dataset.ens_mean.shape) (3, 65160) >>> print(dataset.ens_mean[:, 0]) # doctest: +SKIP [ 246.67957157 246.33497583 246.28476225]
load_obs(valid_dates, file_template, data_type, geogrid, record_num=None, yrev=False, grib_var=None, grib_level=None, debug=False)
Loads observation data
Data is loaded for a given list of dates. The file template can contain any of the following bracketed variables:
- {yyyy}
- {mm}
- {dd}
- {hh}
Within a loop over the dates, the bracketed variables above are replaced with the appropriate value.
Parameters
- valid_dates (list of strings): list of valid dates in YYYYMMDD or YYYYMMDDHH format
- file_template (string): file template used to construct file names for each date, fhr and member
- data_type (string): data type (bin, grib1 or grib2)
- geogrid (Geogrid): Geogrid associated with the data
- record_num (int): binary record containing the desired variable - if None then the file is assumed to be a single record (default)
- yrev (boolean): whether fcst data is reversed in the y-direction, and should be flipped when loaded (default: False)
- grib_var (string): grib variable name (for grib files only)
- grib_level (string): grib level name (for grib files only)
- debug (boolean): if True the file data is loaded from will be printed out (default: False)
Returns
- Observation object containing the observation data and some QC data
Examples
Load a few days of observation data
>>> from cpc.geogrids import Geogrid >>> from cpc.geofiles.loading import load_obs >>> valid_dates = ['20150101', '20150102', '20150103'] >>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/tmean_01d_{yyyy}{mm}{dd}.bin' >>> data_type = 'binary' >>> geogrid = Geogrid('1deg-global') >>> dataset = load_obs(valid_dates, file_template, data_type, geogrid) >>> print(dataset.obs.shape, dataset.obs[:, 0]) # doctest: +SKIP (3, 65160) [-28.48999405 -28.04499435 -27.81749725]