readme_3a26 Documentation for 3A-26, version 5.01 ................................................................ Algorithm 3A-26: Estimation of Space-Time Rain Rate Statistics Using a Multiple Thresholding Technique Name of Contact: Bob Meneghini NASA/GSFC Ph: 301-286-9128 email: bob@meneg.gsfc.nasa.gov Date: 24 August 1999 1a. Objectives of the algorithm: The primary objective of 3a-26 is to compute the rain rate statistics over 5 degree (latitude) x 5 degree (longitude) x 1 month space-time regions. The output products include the estimated values of the probability distribution function of the space-time rain rates at 4 'levels' (2 km, 4 km, 6 km and path-averaged) and the mean, standard deviation, and probability of rain derived from these distributions. Three different rain rate estimates are used for the high resolution rain rate inputs to the algorithm: the standard Z-R (or 0th-order estimate having no attenuation correction), the Hitschfeld-Bordan (H-B), and the rain rates taken from 2a-25. (Fits based on the high resolution inputs from the surface reference technique are output to the diagnostic file for evaluation). This algorithm is based on a statistical procedure. Although the radar team believes that a statistical method of this type should be implemented for TRMM, the method is relatively new and the testing has been carried out only on simulated data and on preliminary TRMM data. Caution on the use of the results is well warranted. 1b. Description of the Method: A general understanding of the method can be gained by noting that the amount of attenuation in the TRMM radar signal depends on the 2-way path attenuation down to the range gate of interest. This attenuation increases as the range gate is taken deeper into the storm (closer to the surface) and as the rain rate increases. Although some general features of the rain are used in 2a-25, the rain rate estimates are obtained at each instantaneous field of view (IFOV) of the instrument. The space-time statistics of these high resolution estimates are done in 3a-25. Most users of the TRMM radar data will be interested in the output data from 3a-25 and not the data from 3a-26. Algorithm 3a-26 serves as an alternative way of estimating the space-time rain statistics. The idea behind the method is that because of attenuation at high rain rates and low signal to noise ratios at light rain rates, there will usually exist an intermediate region over which the rain rate estimates are most accurate. Using only these estimates and an assumption as to the form of the probability distribution function (log-normal), the parameters of the distribution can be found by minimizing the rms difference between the hypothetical distribution and the values of the distribution obtained directly from the measurements. Once the distribution is estimated, the mean and standard deviation of the distribution can be calculated [Refs. 1-2]. Useful by-products from the calculation of the probability distribution of rain rates are the fractional areas above (or below) particular rain rate thresholds. These data can be used as inputs to some of the area-time integral (ATI) methods that have been proposed [Refs. 3-5]. Although the data can be used to implement the ATI method, the method used in 3a-26 is itself not an ATI method. The behavior of the estimates depends strongly on the magnitude and type of threshold as well as the method that is used to determine the high resolution rain rates. There are 3 methods that are used to determine the high resolution rain rates: the Z-R (0th order without attenuation correction), the Hitschfeld-Bordan (H-B), and the hybrid method of 2a-25. A fourth method, based solely upon the surface reference method, is implemented in the code but the results are output only to a diagnostic file for evaluation. For the 3 estimates of rain rate (Z-R, HB and 2a-25), Q (or zeta as defined in 2a-25) is used as the threshold parameter. What this means is that if the threshold is set to a particular value, Q*, then if the measured value of Q is less than Q*, the corresponding rain rate is accepted - that is, it is used to update the distribution function of rain rates. On the other hand, if Q exceeds Q* the corresponding rain rate estimate is rejected - that is, it is not used to update the distribution function. As the threshold value, Q, is increased a larger percentage of the rain rates will be accepted. The converse holds so that as Q is decreased a smaller percentage of the rain rates will be used in estimating the distribution function. It should be noted that Q is a proxy for the attenuation and usually assumes a value between 0 and 1. If the Z-R method of estimating the high resolution rain rates is considered, the corresponding output files include the rain rate distribution function, zeroOrderpDf, and the mean, standard deviation, and probability of rain derived from the distribution, zeroOrderFit, for 6 different values of the Q threshold. The six values of Q are: 0.1, 0.2, 0.3, 0.5, 0.75 and 0.9999. Which set of values corresponding to which threshold should be used ? Simulations suggest that if the total number of rain points is on the order of 500 to 1000, the best accuracy is usually obtained by using a threshold value of 0.3. This corresponds to the 3rd array element so that the monthly mean rain rate (using the Z-R method) over the 5 x 5 degree box (lat, long) at height level, ih, is given by: mean = zeroOrderFit(lat, long, 1, ih, iq = 3) The standard deviation and probability of rain are given by: std dev = zeroOrderFit(lat, long, 2, ih, iq = 3) Pr (Rain) = zeroOrderFit(lat, long, 3, ih, iq = 3) Simulations indicate that for a large number of rain points (N > 5000), the use of smaller threshold values (Q = 0.1 or 0.2) leads to better estimates of the mean space-time rain rate. In the case of Q = 0.2 we have: mean = zeroOrderFit(lat, long, 1, ih, iq = 2) std dev = zeroOrderFit(lat, long, 2, ih, iq = 2) Pr (Rain) = zeroOrderFit(lat, long, 3, ih, iq = 2) A useful set for comparison is the choice: Q = 0.999 (array element 6). In this case nearly all of the Z-R rain rate estimates are accepted so that the method reduces to fitting almost all the Z-R derived rain rates to a log-normal distribution: mean = zeroOrderFit(lat, long, 1, ih, iq = 6) std dev = zeroOrderFit(lat, long, 2, ih, iq = 6) Pr (Rain) = zeroOrderFit(lat, long, 3, ih, iq = 6) The estimate of the mean as determined from the zeroOrderFit HDF output variable should be considered the primary output of the algorithm. Since Q = 0.3 is considered, nominally, as the optimum choice of threshold, the variable, rainMeanTH, has been defined to store these values. In particular: rainMeanTH(lat,long,ih) = zeroOrderFit(lat,long,ih,1,3) The accuracy of the results at other Q thresholds and the statistics derived from the Hitschfeld-Bordan (hbFit) and rain rates from 2a-25 (fit2A25) will be evaluated as additional data from the TRMM radar become available. 1c. Relationship of 3a-26 outputs to those of 3a-25 In comparing the statistics from 3a-25 and 3a-26 there are 2 differences between these data sets that should be kept in mind. The first is that the statistics produced from 3a-25 are conditioned either on the presence of rain or on the presence of a particular type of rain (stratiform or convective). For the 3a-26 products the means and standard deviations derived from the zeroOrderFit, hbFit and fit2A25 arrays are unconditioned - that is, the statistics include both rain and no-rain events. The second difference is that the set of heights for the 3a-26 products is a subset of the heights used for the (low resolution) products of 3a-25. For the 3a-26 products, the height levels relative to the ellipsoid are: hlevel height above ellipsoid ____________________________________________ 1 2 km 2 4 km 3 6 km 4 path-average For 3a-25 products, the height levels relative to the ellipsoid are: hlevel height above ellipsoid ______________________________________________ 1 2 km 2 4 km 3 6 km 4 10 km 5 15 km 6 path-average In an earlier versions of the program, the height levels were defined relative to the local surface. In the latest versions of 3a-25 and 3a-26 all heights are measured relative to the earth's ellipsoid. As an example, assume that the monthly rain accumulations, MRA (millimeters/month), are to be computed over the 5 degree x 5 degree latitude-longitude box specified by (lat, long) for the rain rates measured at a height level given by hlevel. From 3a-25, the mean rain rate (mm/hr), conditioned on rain being present at height level, ih, is given by: rainMean1(lat, long, ih). To convert this to an unconditioned mean rain rate the quantity is first multiplied by the probability of rain. This can be approximated by the ratio of the number of rain counts (rainPix1(lat,long,ih)) to the total number of observations over the month (ttlPix1(lat, long)). To convert this to a monthly accumulation, the unconditioned rain rate is multiplied by the number of hours in a (30 day) month, 720, so that the MRA (mm/month) as derived from the 3a-25 products, is: MRA(3a-25) = rainMean1(lat,long,ih)*PrRain(lat,long,ih)*720 where PrRain(lat,long,ih) = rainPix1(lat,long,ih)/ttlPix1(lat,long) From the 3a-26 products, the MRA (mm/month), using the zeroth-order estimate (Z-R) is: MRA(3a-26) = zeroOrderFit(lat,long,ih,1,iqthres)*720 For the 3rd threshold, Q = 0.3, the MRA is MRA(3a-26) = zeroOrderFit(lat,long,ih,1,3)*720 or, equivalently, MRA(3a-26) = rainMeanTH(lat,long,ih)*720 1d. Relationship between 3a-26 and the fractional areas above particular thresholds The single threshold technique (ATI) uses the fractional area above a particular rain rate threshold as a linear estimator for the area-average rain rate. Estimates of the fractional areas above a threshold can be obtained from the estimated distribution functions described above. As noted above, the counts, which are proportional to the probability distribution functions of rain, are stored in the arrays: zeroOrderFit(16, 72, 4, 3, 6) hbFit(16, 72, 4, 3, 6) fit2A25(16, 72, 4, 3, 6) where the 5 dimensional array refers to: (latitude, longitude, height, fitting parameter, Q threshold) and where fitting parameter = 1 (mean value of log-normal distribution) = 2 (standard deviation of log-normal distribution) = 3 (probability of rain) It is important to note that these counts only include rain counts. To add in the no-rain counts, note that the total number of counts, ntot(lat,long), and the total number of rain counts (at level ih), nrain(lat,long,ih), are output variables so that N_no-rain(lat,long,ih) = ntot(lat,long) - nrain(lat,long,ih) The probability distribution function, zeroOrderpDf' (unnormalized), that includes the no-rain cases is given by: zeroOrderpDf'(lat,long,ir,ih,iq) = zeroOrderpDf(lat,long,ir,ih,iq) + N_no-rain(lat,long,ih) for ir = 1,..,25 iq = 1,..,6 the formulas for hbpDf and pDf2a25 are identical The variable zeroOrderpDf'(lat,long,irainth,ih,iqthres) is the number of rain counts above the rain rate threshold corresponding to the 'ir' indice. Denote this rain rate by RR_th. The fractional area below the rain rate threshold RR_th at height level ih at the Q threshold, iq, using the Z-R estimates of rain rates is: Fr_Area {R < RRth} (lat,long,ih) = zeroOrderpDf'(lat,long,ir,ih,iq)/ntot(lat,long) The fractional area above this threshold is: Fr_Area_RR>RRth_0th(lat,long,ih) = 1 - Fr_Area {R < RRth} (lat,long,ih) For example, to compute the fractional area above the threshold of 2.05 mm/h (ir = 9 - see definition of RRcategories in 3b below) at the Q threshold of 0.9999 (iq = 6) at a height of 2 km above the ellipsoid (ih =1) the following equations are used: zeroOrderpDf'(lat,long,ir=9,ih=1,iq=6) = zeroOrderpDf(lat,long,ir=9,ih=1,iq=6) + + N_no-rain(lat,long,ih=1) N_no-rain(lat,long,ih=1) = ntot(lat,long) - nrain(lat,long,ih=1) Fr_Area {R<2.05} (lat,long,ih=1) = zeroOrderpDf'(lat,long,ir=9,ih=1,iq=6)/ntot(lat,long) Fr_Area {RR>2.05} (lat,long,ih=1) = 1 - Fr_Area {R<2.05} (lat,long,ih=1) So that the fractional area above 2.05 mm/h over the 5 x 5 degree box (lat, long) over the month (which uses the Z-R derived rain rates and nearly all the data, iq = 6) can be expressed in terms of the HDF outputs: zeroOrderpDf(lat,long,ir=9,ih=1,iq=6) nrain(lat,long,ih=1) ntot(lat,long) 1e. Reliability estimates: rms difference between experimentally determined values of the pDf and the fitted values of the pDf at those values for which the experimentally-determined pDf increases monontonically. reliabZeroOrder(16,72,4,6) reliabHB(16,72,4,6) reliabSRT(16,72,4,6) if the number of data points is too few or an error occurs in the fitting procedure, the following default values for reliab* and *Fit will be used: if # of rain occurences, nrain, in the (lat, long) box are too few (nrain < qqmin .and. nrain.ne.0) (qqmin = 250) *Fit and reliab* will be set to -999. if number of data points is fewer than twice the # of unknowns or if the number of threshold levels is too few or if warning or fatal error occurs in the fitting *Fit and reliab* will be set to -777. if unconditioned mean rain rate is greater than 3 mm/h but distribution at R = 0.1 mm/h is concave (postivie 2nd derivative) reliab* set to -888 but *Fit parameters are output in normal fashion if unconditioned mean rain rate is greater than 3 mm/h and distribution at R = 0.1 mm/h is convex (negative 2nd derivative) reliab* set to -555 and *Fit parameters are output in normal fashion if distribution at R = 0.1 mm/h is convex (negative 2nd derivative) but unconditioned mean is less than or equal to 3 1f. Definition of the latitude-longitude boxes The products are defined on a 5 degree x 5 degree x 1 month grid that covers the TRMM orbit. The latitude boxes are labeled from 1 to 36 where box 1 covers from 40 S to 35 S and box 36 covers from 35 N to 40 N. The longitude boxes are labeled from 1 to 72 where box 1 runs from 180 W to 175 W and box 72 runs from 175 E to 179.999 E. 2a. Source Code: (anonymous ftp site: priam.gsfc.nasa.gov, directory: pub/trmm_code/v4_3a26/ f3a26_v5.01_HDF.f (138.7 kBytes) readinto_3a26_v5.01.f (18.0 kBytes) r1mach.f (13.1 kBytes) cmlib.f (172.5 kBytes) xsetun.f (1.1 kBytes) 2b. Notes on the processing procedure: 1. assume that 1 granule of data corresponds to 1 orbit 2. the program is set up to read a scan line of data at a time from 1C-21, 2A-21, 2A-23 and 2A-25 until the full granule of data has been processed 3. as all the output products are over a 5 degree x 5 degree x 1 month space-time region, after each granule is processed, the program will write the partially accumulated products to temporary storage. When the next processing cycle begins, these products will be read from temporary storage, and then overwritten once the updated statistics are completed. 4. at the end of the processing cycle (1 month), a subroutine within the program will be used to output the desired statistics to the HDF file. 5. for the 0th and HB estimates of rain rate, the EDR (effective dynamic range) is based on the quantity zeta (as defined in 2a-25) or Q (as defined in 3a-26), where: Q = zeta = 0.2 ln10 beta int(0,r) [alpha* Zm**beta] where Zm is the 'measured' or 'apparent' reflectivity factor k = alpha * Z ** beta, where k = attenuation coefficient or specific attenuation (dB/km) Z = actual reflectivity factor (mm**6/m**3) the coefficients alpha, and beta are read from the 2a-25 output note that as Q = zeta go to one the path-integrated-attenuation (pia) increases w/o bound if k-Z relationship is exact 6. for the multiple threshold method, a necessary condition is that the random variable (that characterizes the radar-measured quantity) be a monotonic function of the quantity that we wish to measure. For example, in the presence of attenuation, the apparent reflectivity factor at the surface Zm(surface) [or the rain rate estimate based on zm] is a non-monotonic function of the true rain rate; as such it is not an appropriate choice for this method. On the other hand, Q (or zeta) is a monotonic function of the path-integrated rain rate and is appropriate for the 0th order and HB estimates of RR 7. for the path-integrated attenuation (PIA) as derived from the surface reference technique (and the corresponding rain rate), an appropriate proxy variable is the SRT estimate of PIA itself. Results based on the SRT are output only to diagnostic file. 8. the motivating principle of the multiple threshold method is that for area-wide estimates of the rain rate it is more accurate to exapolate to the low and high regions of rain rate than to attempt to measure the distribution of these values directly. Reasons for the possible poor performance of the radar at high and low rain rates are: i. low SNR at low rain rates ii. low SNR at high rain rates and deeper storm penetrations due to signal attenuation iii. higher variability in Z-R laws at low rain rates 9. in the multiple threshold method, an effective dynamic range (EDR) is selected. the EDR is defined as the region over which the rain rates or Zm estimates are expected to have the highest accuracy (where the signal-to-noise ratio is high and attenuation is low) 10. presently, the maximum number of thresholds within the EDR is taken to be 25 (the optimum number is still an issue and will depend upon the number of samples and the range of the variable) 11. if the number of samples of the histogram is small, then the estimated pDf will be of dubious worth. To circumvent this problem, assume that the total number of IFOVs over the averaging domain, with rain present be larger than some number, iqqmin. 2c. Running the program (on priam.gsfc.nasa.gov workstation: i type: 'make' [uses Makefile in same subdirectory] ii type: 'f3a26_v5.01_HDF "1C-21 input file" "2A-21 input file" "2A-23 input file" "2A-25 input file" "3a-26 output file" "3a-26 int file" '3a-26 verification file" "begin/middle/end granule of month: ch string" "year: ch string" "month: ch string"' where, for example, "1C-21 input file" = 1C21.970317.100.1.HDF "2A-21 input file" = 2A21.970317.HDF "2A-23 input file" = 2A23.970317.100.1.HDF "2A-25 input file" = 2A25.970317.100.1.HDF "3A-26 output file" = 3A26_970317_B.HDF (for first granule of month) = 3A26_970317_E.HDF (for last granule of month) "3a-26 int file" = 3A26_970317_INT (intermediate file) "3a-26 VI file" = 3A26_970317_DIAG_B (verification or diagnostic file for 1st granule) "begin/middle/end granule" = character string equal to 'BEGIN' 'MIDDLE' or 'END' "year" = character string - input as integer "month" = character string - input as integer 3a. Input Data (Same as 3a-25): Data are read from 1C-21, 2A-21, 2A-23, and 2A-25. The data volumes per scan are approximately: 0.9 kbytes from 2a-21 0.4 kbytes from 2a-23 42.0 kbytes from 2a-25 17.0 kbytes from 1C-21 Total: 60 kbytes per scan 3b. Input Parameters (initialized in 3a-26) data QUthres0th /0.1,0.2,0.3,0.5,0.75,0.9999/ ! Q-thresholds for Z-R and 2a-25 rain rates data QUthresHB /0.1,0.2,0.3,0.5,0.75,0.9999/ ! Q-thresholds for HB data QLsrt /1.5,1.,0.8,0.6,0.4,0.1/ ! PIA-thresholds for SRT The rain rate distribution functions consist of the count values in the following 25 rain rate categories data RRcategories/0.205, 0.27, 0.3646, 0.4863, 0.648, 0.865, ! in mm/hr 1.153, 1.537, 2.050, 2.734, 3.646, 4.862, 6.484, 8.6468, 11.531, 15.376, 20.505, 27.344, 36.463, 48.625, 64.84, 86.47, 115.31, 153.76, 205.048/ At present, a log-normal fitting through the points of the rain rates distribution is made only when the following condition is satisfied: iqqmin = 200 ! minimum number of valid rain occurences needed for fitting to be done In identifying the number of valid thresholds, we require that the count value increase by a certain amount from between successive rain rates thresholds; in fact, the upper rain rate threshold can be identified as that threshold beyond which the count value does not increase by at least 'nfu' counts, where: nfu = 30 4a. Output Variables a. arrays for the calculation of probabilities [int*4] ttlCount(16,72) = # of observations at each 5 x 5 degree box over the month rainCount(16,72,4) = # of rain observations at each 5 x 5 degree box over the month (rain present) Note that all height levels are measured relative to the ellipsoid nrain(i,j,1) @ h = 2 km nrain(i,j,2) @ h = 4 km nrain(i,j,3) @ h = 6 km nrain(i,j,4) path-averaged b. arrays for output of 'truncated' histograms at each 5 x 5 x 1 mo box for 3 RR estimates/ 4 'levels' zeroOrderpDf(16,72,25,4,6) = number of counts in the probability distribution function (25 categories) using 0th order (Z-R) rain rate estimate heights with respect to the ellipsoid of 2, 4, 6 km and path-av for 6 Q thresholds hbpDf(16,72,25,4,6) = same as above except using the HB estimate of rain rate pDf2A25(16,72,25,4,6) = same as above except using the rain rate estimates from 2a-25 convention for zeroOrderpDf(16,72,25,4,6), hbpDf(16,72,25,4,6), and pDf2A25(16,72,25,4,6) first arg: latitude second: longitude third: rain rate category for pDf fourth: height 'level': 1 = RR @ 2 km 2 = RR @ 4 km 3 = RR @ 6 km 4 = path-averaged RR fifth: Q threshold c. {mean, std dev, Pr(Rain) derived from log-normal assumption to rain rate distribution zeroOrderFit(16,72,4,3,6) : 3 statistics [mean, std dev, Pr(R)] of distribution fit of the rain rates as derived from the 0th (Z-R) method for 6 thresholds at 4 'levels' hbFit(16,72,4,3,6) : same as above except Hitschfeld-Bordan method used for rain rate estimates fit2A25(16,72,4,3,6) : same as above except data from 2a-25 are used for rain rates estimates d. reliability factors: reliabOrderFit(16,72,4,6) reliabHBfit(16,72,4,6) reliab2A25fit(16,72,4,6) if the number of data points is too few or an error occurs in the fitting procedure, some default values for reliab* and *Fit will be used. Tentatively, if # of points are too few *Fit and reliab* will be set to -999. if warning or fatal error occurs in the fitting *Fit and reliab* will be set to -777. if number of data points is fewer than the # of unknowns or if the pDf doesn't increase monotonically 4b. Output Data Volume (per month): Approximately 9.7 Mbytes 5. Internal Storage Requirements: One intermediate files is needed to store the running statistics. The storage needed is somewhat larger than the output data volume: 11.1 Mbytes 6. Processing Procedure: The basic steps in the procedure are (first 4 are similar to 3a-25 algorithm): i. read in data (scan by scan) from 2a-21, 2a-23, 2a-25 and 1c-21 ii. adjust the range gate numbering conventions so that Zm, Zt and R are aligned properly iii. find the coarse boxes to which the 49 IFOVs belong (coarse resolution boxes are 5 degree x 5 degree latitude-longitude boxes) iv. resample Zm, Zt and R from the range direction onto the vertical v. update the estimated probability distribution function for the various rain rate methods at each 5 x 5 degree box at the various heights, and for threshold values. vi. if the granule crosses the month boundary, do a nonlinear least squares fit to the distributions determined in step 5, assuming a log-normal distribution; from the fitting parameters, calculate the mean, std deviation and probability of rain for each distribution. vii. Re- initialize the intermediate file 7. Comments and Issues: i. It is assumed in the program that the verification file does not exist; if it already exists an error will occur. ii. Differences exists between the height levels at which the 3a-25 and 3a-26 statistics are calculated; for 3a-25 the levels are [2, 4, 6, 10, 15] km and the path-average while for 3a-26 the levels are [2, 4, 6] km and the path-average. iii. Presently, the height levels are being defined relative to the ellipsoid and not the local surface. iv. Resampling of the radar data from the range direction to the vertical is done differently in 3a-25 and 3a-26. In 3a-25, the estimate of the reflectivity factor at a particular height is done by a gaussian weighting of the range gates that intersect that height. 3a-26 uses only a single value of Z and R - that gate, the center of which, intersects the height of interest. v. The Z-R or 0th method refers to the zeroth order solution of the reflectivity factor from the basic weather radar equation. In this approximation, no compensation is made for attenuation so the reflectivity factor is directly proportional to the measured radar return power. This approximate reflectivity factor is sometimes called the apparent or measured reflectivity factor. In converting any estimate of the reflectivity factor, Z(est), to rain rate, R, the power-law approximation is used: R = a * Z(est) ** b where a and b are obtained from 2a-25. The HB or Hitschfeld-Bordan solution to the reflectivity factor, Z, is obtained by using an specific attenuation-reflectivity factor (k-Z) relationship and then solving the weather radar equation for Z. Description of the rain rates from 2a-25 is given in the documentation for this algorithm. 8. References [1] Meneghini, R., 1998, J. Appl. Meteor., 37, (in press). [2] Meneghini, R., and J. Jones, 1993, J. Appl. Meteor., 32, 386-398. [3] Short, D.A., K. Shimizu, B. Kedem, 1993, J. Appl. Meteor., 32, 182-192. [4] Kedem, B., L.S. Chiu, G.R. North, 1990, J. Geophys. Res., 96, 1965-1972. [5] Atlas, D., D. Rosenfeld, D.A. Short, 1990, J. Geophys. Res., 95, 2153-2160.