Category Archives: Statistics

Statistics job resources

Applying for an academic job is serious work.  I ended up lucky (though, luck favors the prepared (Louis Pasteur)).  I received two job offers this season and took my first-choice job.  But I worked hard to get those offers.  I kept a detailed CV my entire student career (starting as a BA student, not waiting until job season to start), wrote an extensive teaching dossier for the 20 courses I’ve taught and ugrad tutoring experience, and developed a research statement as that vision became clearer to me.  Clearly, self-investment and personal excellence are the most important ingredients.  Next is to find people who want to hire you. Two sites and one magazine basically covers the bases for statistics. 1.  If you’re a statistics student, you’re already a member of the ASA, right?  If so, the back of the AmStat News magazine has many jobs listed. http://magazine.amstat.org/ 2. Many jobs are posted at the American Statistical Association (ASA) jobs website.   Subscribe to their feed in your RSS reader: http://jobs.amstat.org/search/results/index.cfm?SN=25&ss=1&display=rss While I have had my CV posted on the site for years, I’ve never received any contact because of it.  I think the more direct approach of networking or replying to specific jobs is more effective. 3. The University of Florida statistics website lists many jobs, too.  My impression is that this site is even more comprehensive than jobs.amstat sometimes. http://www.stat.ufl.edu/vlib/Index.html I recommend being subscribed to the jobs.amstat.org in your RSS reader, because then most of the jobs will come to you.  You can follow-up at the UFlorida website to make sure you’re not missing anything.  Start looking in Sept/Oct and work on cover letters through Nov/Dec for the Dec/Jan/Feb deadlines.  Ask for letters of recommendation early (maybe even late summer while your professors are not busy with the semester).  Ask your advisor to look over your CV, cover letter, and other submission materials (scan a pdf of your unofficial transcript).  They’ve reviewed many applications hiring in their department before and will have good advice.  Send your application materials (all in pdf format — not doc!) as soon as you are ready to help yours be near the top of their review pile.  And while your application is in the hands of many hiring committees, try not to sweat — you’ve done all you can and it’s largely out of your control until they ask for an interview (or send you a form rejection letter, or never respond to you at all).  Feel free to send a follow-up email to request status if it’s a week or so after their self-predicted decision deadline, if it will help calm your nerves, but try not to hassle them.  It’s a very challenging market and positions regularly get 80-300 applications, so everything you can do to rise to the top of that deep stack can make the difference between getting a toe in the door and the alternative. Interviewing is next step.  Here are some pages with questions to prepare for.  Write your questions down just as you’d say them and practice saying them aloud, maybe to a friend who will listen.  You want to clarify your answers to yourself and get them to flow smoothly out of your mouth. 10 tough interview questions General advice The job talk is the last step. LaTeX’s beamer package is very good (remove all the extra buttons, and reduce headers and footers to a minimum). Mac’s iWork Keynote is one of the best presentation packages. BBP was a great resource, provided you can ignore all the MSPP BS.  First five slidesTemplate. Video. Matt Might’s presentation tips and job hunt advice. CS Berkeley Negotiating for your salary, start-up, teaching reduction, and more — ask your advisor for advice.  If you have a second offer, all of this becomes much, much easier!
... more

tdllicor: estimates discrimination and other parameters associated with leaf photosynthesis

Together with David Hanson, I developed R package tdllicor which reads TDL and Licor files, aligns them, and calculates quantities of interest with bootstrap intervals.  It is currently private as it is specialized and not of general interest.  It has already been important for a number of conference publications and is used for active research: Conference Publications DT Pater, EB Erhardt, and DT Hanson. Photorespiratory and respiratory carbon isotope fractionation in leaves. In Proceedings of the Biophysical Society 55th Annual Meeting, Baltimore, MD, Mar 2010. Biophysical Society. DT Pater, EB Erhardt, and DT Hanson. Isotopic signature of photorespiration. In Joint Annual Meetings of the American Society of Plant Biologists and the Canadian Society of Plant Physiologists, Montreal, CA, August 2010.
... more

mortest: estimates the total number of carcasses at a windfarm

Working with Aaftab Jain, we developed a estimator for total number of bird and bat carcasses at a windfarm called “mortest” and implemented it as an R package.  We are interested in estimating c, the total number of carcasses (mortalities) in a period (year). The total number of carcasses is the sum of carcasses over size classes, c = sum_s=1^S c_s. If carcasses are retained (that is, not scavenged) and searcher efficiency is perfect (every carcass is found) and every tower is searched, then each c_s would be counted perfectly. Yet, carcass scavenging by predators and searchers overlooking carcasses are a reality, making observed counts an underestimate. Furthermore, tower sampling rather than censusing is a cost-saving convenience. Our estimator of total mortality, c, weighs the estimates from different search intervals and adjusts the observed counts for scavenging, search efficiency, searchable area of each tower, and proportion of towers searched, accounting for uncertainty in these estimates using a bootstrap. The software was written by Erik Erhardt and is currently private.  Contact Aaftab Jain <aaftabj+gmail.com> for more information for using the software.
... more

Talk: ACASA Annual meeting 2011

I’ll be giving a shortened version of my Bayesian stable isotope mixing model talk (title and abstract below) at the Albuquerque Chapter of the American Statistical Association (ACASA) annual meeting on Friday, April 29, 2011. I gave two distinct longer versions of this talk recently as part of job interview talks at St. Louis University and the University of New Mexico.  I’m looking forward to the meeting to visit with people who I’ve worked with over the last several years, organizing judging events at science fairs, and other events. A Bayesian Framework for Stable Isotope Mixing Models Erik B. Erhardt, The Mind Research Network; Edward J. Bedrick, Division of Epidemiology and Biostatistics, University of New Mexico Health Sciences Center Stable isotope sourcing is used to estimate proportional contributions of sources to a mixture, such as in the analysis of animal diets and plant nutrient use. Statistical methods for inference on the diet proportions using stable isotopes have focused on the linear mixing model. Existing frequentist methods provide inferences when the diet proportion vector can be uniquely solved for in terms of the isotope ratios. Bayesian methods apply for arbitrary numbers of isotopes and diet sources but existing models are somewhat limited as they assume that trophic fractionation or discrimination are estimated without error or that isotope ratios are uncorrelated. We present a Bayesian model for the estimation of mean diet that accounts for uncertainty in source means and discrimination and allows correlated isotope ratios. This model is easily extended to allow the diet proportion vector to depend on covariates, such as time. Two examples are used to illustrate the methodology.
... more

Paper published: δ13C of soluble sugars in Tillandsia epiphytes

In a previous post I discussed this paper and how fun it was to write with Laurel.  Here I’m happy to report it’s available electronically (SpringerLink, pdf) and soon in paper. Laurel K. Goode, Erik B. Erhardt, Louis S. Santiago, Michael F. Allen.  Carbon stable isotopic composition of soluble sugars in Tillandsia epiphytes varies in response to shifts in habitat. Oecologia (2010) 163:583–590. DOI 10.1007/s00442-010-1577-5 Received: 11 March 2009 / Accepted: 25 January 2010 / Published online: 13 February 2010
... more

Visions

A few important areas of focus, reflecting what I’m doing and where I’m going.

Professional

Statistics for Stable Isotope applications

My vision is to be the recognized leader of statistical methods in stable isotope sourcing.  This will be accomplished through publishing papers from my dissertation work, collaborations leading to publications on methodological extensions, and giving talks in university departments and at courses and conferences.

Postdoctoral fellowship at the Mind Research Network

At the MRN my vision is to be an exceptional statistician, a valuable member of Vince Calhoun’s team, and an expert on statistical methods applying to ICA and fMRI.  This will be accomplished with thorough discussions and detailed answers to statistical inquiries, active curiosity about others’ work and how I may contribute, and careful study of existing ICA models and sound application of statistical principles. My career goals at the MRN are to develop a broad and deep knowledge of the methods for analysis of fMRI data in particular, and brain imaging data in general, to publish carefully developed extensions in well-written papers, and make contributions to others’ work.  This will be accomplished by dissecting the modeling details from published work and uncovering further details by contacting the authors, appealing to theoretical results and experimental confirmation before publicizing new methods, and helping others consider their methods, results, and interpretations.

Personal

Dance

My vision is to contribute more to the Albuquerque contra dance community and bring dance to more people, especially youth.  This will be accomplished by making opportunities for new callers, writing and calling dances, leading and participating in workshops, helping make more dance and music opportunities to bring the community together, outreach efforts to introduce dance to more people, and always collaborating with our vibrant New Mexico dance community to make it happen.
... more

Paper accepted: δ13C of soluble sugars in Tillandsia epiphytes vary in response to shifts in habitat

Laurel Goode, Erik Erhardt, Louis Santiago, and Michael Allen. δ13C of soluble sugars in Tillandsia epiphytes vary in response to shifts in habitat. Oecologia, Physiological ecology section, 2010. I met Laurel at SIRFER 2008 where we enjoyed a wide range of stable isotope lectures and lab experience. She first used my software, SISUS, to estimate the proportion of C3 vs CAM photosynthesis of epiphytes. Our work and friendship led to the collaboration where we thought about and developed a model for the environmental factors affecting the phothsynthetic pathways of the species studied. Abstract We studied carbon stable isotopic composition (δ13C) of bulk leaf tissue and extracted sugars of four epiphytic Tillandsia species to investigate flexibility in the use of crassulacean acid metabolism (CAM) and C3 photosynthetic pathways. Plants growing in two seasonally-dry tropical forest reserves in Mexico that differ in annual precipitation were measured during wet and dry seasons, and among secondary, mature, and wetland forest types within each site. Dry season sugars were more enriched in 13C than wet season sugars, but there was no seasonal difference in bulk tissues. Bulk tissue δ13C differed by species and by forest type, with values from open-canopied wetlands more enriched in 13C than mature or secondary forest types. The shifts within forest habitat were related to temporal and spatial changes in vapour pressure deficits (VPD). Modeling results estimate a possible 4% increase in the proportional contribution of the C3 pathway during the wet season, emphasizing that any seasonal or habitat-mediated variation in photosynthetic pathway appears to be quite moderate and within the range of isotopic effects caused by variation in stomatal conductance during assimilation through the C3 pathway and environmental variation in VPD. Carbon isotopic analysis of sugars together with bulk leaf tissue offer a useful approach for incorporating short- and long-term measurements of carbon isotope discrimination during photosynthesis.
... more

Wishart distribution in WinBUGS, nonstandard parameterization

The Wishart distribution and especially the inverse-Wishart distribution are the source of some confusion because they occasionally appear with alternative parameterizations. Also, the Wishart distribution can be used to model a covariance matrix or a precision matrix (the inverse of a covariance matrix) in different situations, and the inverse-Wishart the same, but the other way round. It’s already becoming complicated. Hal Stern, coauthor of Bayesian Data Analysis (BDA), helped to clarify many issues for me in an email conversation. In this post I hope to clarify the differences in Wishart parameterizations of BDA, the wikipedia pages, and the WinBUGS and OpenBUGS softwares, and show an example in OpenBUGS where the inverse parameterization has to be specified relative to the distribution’s definition for the correct posterior to result. The Wishart distribution commonly arises in the context of the Bayesian multivariate normal model, so first I detail that model. Let $$d$$-dimensional vector $$y_i$$, $$i=1,\ldots,n$$, be independent with $$y_i|\mu,\Sigma\sim\textrm{Normal}(\mu,\Sigma)$$, where $$\mu$$ is a (column) vector of length $$d$$ and $$\Sigma$$ is a $$d\times d$$ covariance matrix, which is symmetric and positive definite. The conjugate priors for the parameters are $$\mu|\mu_0,\Sigma,\kappa_0\sim\textrm{Normal}(\mu_0,\Sigma/\kappa_0)$$, where $$\mu_0$$ is the prior mean and $$\kappa_0$$ is the number of prior measurements on the $$\Sigma$$ scale, and $$\Sigma|\Sigma_0,\nu_0\sim\textrm{Inv-Wishart}(\Sigma_0,\nu_0)$$, where $$\Sigma_0$$ is the prior covariance matrix with $$\nu_0$$ degrees of freedom. I’ll follow the definition of the Wishart and inverse-Wishart distributions used in BDA and Wikipedia. $$W|A,\nu\sim\textrm{Wishart}(A,\nu)$$ if $$p(W|A,\nu)\propto|A|^{-\nu/2}|W|^{(\nu-d-1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(A^{-1}W)\right)$$, with expected value $$\textrm{E}(W)=\nu A$$ (notation: $$\textrm{tr}(X^{T})$$ indicates the trace of the transpose of matrix $$X$$) . The integral is finite when $$\nu\ge d$$ and the density is finite if $$\nu\ge d+1$$. In this definition $$A$$ and $$W$$ are on the same scale. $$A$$ and $$W$$ are both covariance matrices when modeling a sample covariance matrix. Specifically, the scatter matrix $$S$$, where $$S=\sum_{i=1}^n(y_i-\bar{y})(y_i-\bar{y})^{T}$$, has $$S|V,n\sim\textrm{Wishart}(V,n)$$, where $$V$$ is the population covariance and $$n$$ is the sample size informing $$S$$. $$A$$ and $$W$$ are both precision matrices when modeling the inverse covariance (precision) matrix ($$\tau\equiv\Sigma^{-1}$$) of the normal model above. That is $$\tau|\tau_0,\nu_0\sim\textrm{Wishart}(\tau_0,\nu_0)$$, where $$\tau_0$$ is the inverse of the prior population covariance and $$\nu_0$$ can be thought of as the prior sample size. If $$B^{-1}|A,\nu\sim\textrm{Wishart}(A,\nu)$$ (where $$B$$ is a covariance matrix and $$B^{-1}$$ and $$A$$ are precision matrices) then $$B|A^{-1},\nu\sim\textrm{Inv-Wishart}(A^{-1},\nu)$$.  Now, with $$W$$ and $$A$$ as covariance matricies, $$W|A,\nu\sim\textrm{Inv-Wishart}(A,\nu)$$ if $$p(W|A,\nu)\propto|A|^{\nu/2}|W|^{-(\nu+d+1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(AW^{-1})\right)$$, with expected value $$\textrm{E}(W)=A/(\nu-d-1)$$. One point of confusion was when using WinBUGS and OpenBUGS to fit the Bayesian multivariate normal model. I’ll call the WinBUGS parameterization of the Wishart the “BUGS-Wishart” with a covariance parameter ($$R$$) and a precision random matrix ($$W$$) (these are on inverse scales from one another, where our standard definition above had the parameter and random matrix on the same scale). $$W|R,\nu\sim\textrm{BUGS-Wishart}$$ if $$p(W|R,\nu)\propto|R|^{\nu/2}|W|^{(\nu-d-1)/2}\exp\left(-\frac{1}{2}\textrm{tr}(RW)\right)$$, with expected value $$\textrm{E}(W)=\nu R^{-1}$$. By itself, this was straightforward for the parameter to be on the covariance scale rather than the precision scale, however it becomes misleading when incorporated into the normal model. For example, I expected a non-informative prior to have covariance matrix $$R$$ with large diagonal elements. But the following example shows the need for a Bayesian to write down the full posterior to see how the prior and likelihoods combine in the posterior. The second point of confusion was in the prior distributional specification for precision matrix $$\tau$$ in the OpenBUGS Camel example. This example has replicated structured missing bivariate data at $$\pm 2$$, but four complete-data points at the corners of a square ($$\pm 1$$) centered at zero. The camel example is named so because the posterior for $$\rho$$ (rho) has two humps; two of the complete observations support correlation $$+1$$ and two support correlation $$-1$$. The data has a normal distribution parameterized with precision matrix, $$\tau$$, as $$y_i\sim\textrm{Normal}([0,0]^{T},\tau)$$, and precision matrix $$\tau\sim\textrm{Wishart}(R,2)$$, where $$R=\textrm{diag}(0.001,0.001)$$. The correlation $$\rho$$ is calculated from $$\tau$$ by inverting $$\tau$$ to get the covariance matrix $$\Sigma$$ and taking the off-diagonal covariance divided by the square root of the product of the on-diagonal variances. Because the BUGS-Wishart distribution has a covariance parameter $$R$$, I was surprised at their use of a noninformative prior parameter matrix with diagonal values of 0.001 (instead of variance values of 1000). Hal Stern helped clarify why the prior they use is the correct one by writing out the posterior distribution based on the BUGS-Wishart density and the normal likelihood (which is parameterized in terms of precision $$\tau$$). Looking at the exponential kernel of the BUGS-Wishart distribution, we have $$\exp\left(-\frac{1}{2}\textrm{tr}(R\tau)\right)$$. The kernel of the likelihood is $$\exp\left(-\frac{1}{2}\sum_{i=1}^n(y_i-\mu)^{T}\tau(y_i-\mu)\right)=\exp\left(-\frac{1}{2}\textrm{tr}(S_0\tau)\right)$$, where $$S_0=\sum_{i=1}^n(y_i-\mu)(y_i-\mu)^{T}$$ is the “sums of squares” relative to $$\mu$$. Thus, when the prior and likelihood are combined in the posterior, the posterior for $$\tau$$ is now $$\exp\left(-\frac{1}{2}\textrm{tr}((R+S_0)\tau)\right)$$, which is BUGS-Wishart with parameter $$R+S_0$$. This is the correct noninformative analysis since the near-zero matrix $$R$$ from the prior contributes little to the sum with $$S_0$$ informed by the data. However, it requires looking ahead to the form of the posterior to see that the parameter for the BUGS-Wishart needs to be specified as a precision matrix INSTEAD of a covariance matrix! Dave Lunn (first author of the reference for WinBUGS) has written to me that he will try to clarify matters in the next version. I would like to see a second version of the Wishart implemented in WinBUGS and OpenBUGS with the parameter and the random matrix on the same scale, as in BDA, to avoid this awkward inverse specification of the parameter for the BUGS-Wishart distribution.
... more

Paper accepted: Stable Isotope collaboration, Chris Bickford

Christopher P. Bickford, Nate G. Mcdowell, Erik Barry Erhardt, Heath H. Powers, David T. Hanson. (2009) “High frequency field measurements of diurnal carbon isotope discrimination and internal conductance in a semi-arid species, Juniperus monosperma“. Plant, Cell & Environment, online Volume 32, Issue 7, pages 796–810, July 2009. Chris Bickford, PhD candidate UNM Biology, and I met when we attended Iso-Camp at Jim Ehleringer’s lab at U Utah Summer 2008.  On the flight home we started discussing a challenge he was facing in his first of three dissertation papers. He studies details of plant photosynthesis.  He had complicated expressions for leaf carbon isotope discrimination $$\Delta$$ and internal conductance $$g_i$$ based on CO$$_2$$ concentrations of CO$$_2$$ isotopologues $$^{13}C^{16}O^{16}O$$ and $$^{12}C^{16}C^{16}O$$. He needed to propigate the variation of the CO$$_2$$ measurements into his variables of interest, $$\Delta$$ and $$g_i$$.  He also needed to compare his accurate and precise measurements using tunable diode laser spectroscopy (TDL) to predictions from three models. There were a number of statistical issues.  One was how to make model and observation comparisons.  I suggested using RMSE since it includes both variance and bias in the single measurement.  The main issue was the incorporation of variation from the CO$$_2$$ measurements into the quantities of interest.  The bootstrap allowed us to do this.  There were a number of programming sessions in R to write functions and scripts to do all the calculations, create plots, output spreadsheets of results, and so on.  Chris has become a convert from Excel to R over the course of this project.  These methods implemented on this paper will likely flow into later pubs for both Chris and Dave. Chris has taken a postdoc in New Zealand, where he and his wife, Karen, will spend the next two years with their dog.  He defends his dissertation on April 13th.
... more