Monday, August 7, 2017

Confidence Intervals: Fad or Fashion

Confidence intervals seem to be the fad among some in pop stats/data science/analytics. Whenever there is mention of p-hacking, or the ills of publication standards, or the pitfalls of null hypothesis significance testing, CIs almost always seem to be the popular solution.

There are some attractive features of CIs. This paper provides some alternative views of CIs, discusses some strengths and weaknesses, and ultimately proposes that they are on balance superior to p-values and hypothesis testing. CIs can bring more information to the table in terms of effect sizes for a given sample however some of the statements made in this article need to be read with caution. I just wonder how much the fascination with CIs is largely the result of confusing a Bayesian interpretation with a frequentist application or just sloppy misinterpretation. I completely disagree that they are more straight forward to students (compared to interpreting hypothesis tests and p-values as the article claims).

Dave Giles gives a very good review starting with the very basics of what is a parameter vs. an estimator vs. an estimate, sampling distributions etc. After reviewing the concepts key to understanding CIs he points out two very common interpretations of CIs that are clearly wrong:

1) There's a 95% probability that the true value of the regression coefficient lies in the interval [a,b].
2) This interval includes the true value of the regression coefficient 95% of the time.

"we really should talk about the (random) intervals "covering" the (fixed) value of the parameter. If, as some people do, we talk about the parameter "falling in the interval", it sounds as if it's the parameter that's random and the interval that's fixed. Not so!"

In Robust misinterpretation of confidence intervals, the authors take on the idea that confidence intervals offer a panacea for interpretation issues related to null hypothesis significance testing (NHST):

"Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual...Our findings suggest that many researchers do not know the correct interpretation of a CI....As is the case with p-values, CIs do not allow one to make probability statements about parameters or hypotheses."

The authors present evidence about this misunderstanding by presenting subjects with a number of false statements regarding confidence intervals (including the two above pointed out by Dave Giles) and noting the frequency of incorrect affirmations about their truth.

In Osteoarthritis and Cartilage, authors write:

"In spite of frequent discussions of misuse and misunderstanding of probability values (P-values) they still appear in most scientific publications, and the disadvantages of erroneous and simplistic P-value interpretations grow with the number of scientific publications."

They raise a number of issues related to both p-values and confidence intervals (multiplicity of testing, the focus on effect sizes, etc.) and they point out some informative differences between using p-values vs. using standard errors to produce 'error bars.' However, in trying to clarify the advantages of p-values they step really close to what might be considered an erroneous and simplistic interpretation:

"the great advantage with confidence intervals is that they do show what effects are likely to exist in the population. Values excluded from the confidence interval are thus not likely to exist in the population. "

Maybe I am being picky, but if we are going to be picky about interpreting p-values then the same goes for CIs. It sounds a lot like they are talking about 'a parameter falling into an interval' or the 'probability of a parameter falling into an interval' as Dave cautions against. They seem careful enough in their language using the term 'likely' vs. making strong probability statements, so maybe they are making a more heuristic interpretation that while useful may not be the most correct.

In Mastering 'Metrics, Angrist and Pishcke give a great interpretation of confidence intervals that doesn't lend itself in my opinion as easily to abusive probability interpretations:

"By describing a set of parameter values consistent with our data, confidence intervals provide a compact summary of the information these data contain about the population from which they were sampled"

I think the authors Osteoarthritis and Cartilage could have stated their case better if they had said:

"The great advantage of confidence intervals is that they describe what effects in the population are consistent with our sample data. Our sample data is not consistent with population effects excluded from the confidence interval."

Both hypothesis testing and confidence intervals are statements about the compatibility of our observable sample data with population characteristics of interest. The ASAreleased a set of clarifications on statements on p-values. Number 2 states that "P-values do not measure the probability that the studied hypothesis is true." Nor does a confidence interval (again see Ranstan, 2014).

Venturing into the risky practice of making imperfect analogies, take this loosely from the perspective of criminal investigations. We might think of confidence intervals as narrowing the range of suspects based on observed evidence, without providing specific probabilities related to the guilt or innocence of any particular suspect. Better evidence narrows the list, just as better evidence in our sample data (less noise) will narrow the confidence interval.

I see no harm in CIs and more good if they draw more attention to practical/clinical significance of effect sizes. But I think the temptation to incorrectly represent CIs can be just as strong as the temptation to speak boldly of 'significant' findings following an exercise in p-hacking or in the face of meaningless effect sizes. Maybe some sins are greater than others and proponents feel more comfortable with misinterpretations/overinterpretations of CIs than they do with misinterpretations/overinterpretaions of p-values.

Or as Briggs concludes about this issue:

"Since no frequentist can interpret a confidence interval in any but in a logical probability or Bayesian way, it would be best to admit it and abandon frequentism"


Methods of Psychological Research Online 1999, Vol.4, No.2 © 1999 PABST SCIENCE PUBLISHERS Confidence Intervals as an Alternative to Significance Testing Eduard Brandstätter1 Johannes Kepler Universität Linz

J. Ranstam, Why the -value culture is bad and confidence intervals a better alternative, Osteoarthritis and Cartilage, Volume 20, Issue 8, 2012, Pages 805-808, ISSN 1063-4584, http://dx.doi.org/10.1016/j.joca.2012.04.001 (http://www.sciencedirect.com/science/article/pii/S1063458412007789)

Robust misinterpretation of confidence intervals
Rink Hoekstra & Richard D. Morey & Jeffrey N. Rouder &
Eric-Jan Wagenmakers Psychon Bull Rev
DOI 10.3758/s13423-013-0572-3 2014

Friday, July 21, 2017

Regression as a variance based weighted average treatment effect

In Mostly Harmless Econometrics Angrist and Pischke discuss regression in the context of matching. Specifically they show that regression provides variance based weighted average of covariate specific differences in outcomes between treatment and control groups. Matching gives us a weighted average difference in treatment and control outcomes weighted by the empirical distribution of covariates. (see more here). I wanted to roughly sketch this logic out below.

Matching

 δATE = E[y1i | Xi,Di=1] - E[y0i | Xi,Di=0] = ATE

This gives us the average difference in mean outcomes for treatment and control  (y1i,y0i ⊥ Di) i.e. in a randomized controlled experiment potential outcomes are independent from treatment status

We represent the matching estimator empirically by:

 Σ δx P(Xi,=x) where δx is the difference in mean outcome values between treatment and control units at a particular value of X, or  difference in outcome for a particular combination of covariates (y1,y0 ⊥ Di|xi) i.e. conditional independence assumed- hence identification is achieved through a selection on observables framework.


Average differences δx are weighted by  the distribution of covariates via the term P(Xi,=x).

Regression

We can represent a regression parameter using the basic formula taught to most undergraduates:

Single Variable: β = cov(y,D)/v(D)
Multivariable:  βk = cov(y,D*)/v(D*)

where  D* = residual from regression of D on all other covariates and 
E(X’X)-1E(X’y) is a vector with the kth element cov(y,x*)/v(x*) where x* is the residual from regression of that particular ‘x’ on all other covariates.

We can then represent the estimated treatment effect from regression as:

 δR = cov(y,D*)/v(D*) = E[(Di-E[Di|Xi])E[yiIDiXi] / E[(Di-E[Di|Xi])^2]  assuming (y1,y0 ⊥ Di|xi)

Again regression and matching rely on similar identification strategies based on selection on observables/conditional independence.

Let E[yi | DiXi] = E[yi | Di =0,Xi] + δx Di

Then with more algebra we get: δR = cov(y,D*)/v(D*) = E[σ^2D(Xi) δx]/ E[σ^2D(Xi)]

where σ^2D(Xi) is the conditional variance of treatment D given X or  E{E[(Di –E[Di|Xi])^2|Xi]}.

While the algebra is cumbersome and notation heavy, we can see that the way most people are familiar with viewing a regression estimate cov(y,D*)/v(D*)  is equivalent to the term (using expectations)  E[σ2D(Xi) δx]/ E[σ2D(Xi)] , and we can see that this term contains the product of the conditional variance of D and our covariate specific differences in treatment and controls δx.

Hence, regression gives us a variance based weighted average treatment effect, whereas matching provides a distribution weighted average treatment effect.

So what does this mean in practical terms? Angrist and Piscke explain that regression puts more weight on covariate cells where the conditional variance of treatment status is the greatest, or where there are an equal number of treated and control units. They state that differences matter little when the variation of δx is minimal across covariate combinations.

In his post The cardinal sin of matching, Chris Blattman puts it this way:

"For causal inference, the most important difference between regression and matching is what observations count the most. A regression tries to minimize the squared errors, so observations on the margins get a lot of weight. Matching puts the emphasis on observations that have similar X’s, and so those observations on the margin might get no weight at all....Matching might make sense if there are observations in your data that have no business being compared to one another, and in that way produce a better estimate" 

Below is a very simple contrived example. Suppose our data looks like this:
We can see that those in the treatment group tend to have higher outcome values so a straight comparison between treatment and controls will overestimate treatment effects due to selection bias:

 E[Y­­­i|di=1] - E[Y­­­i|di=0] =E[Y1i-Y0i]  +{E[Y0i|di=1] - E[Y0i|di=0]}

 However, if we estimate differences based on an exact matching scheme, we get a much smaller estimate of .67. If we run a regression using all of the data we get .75. If we consider 3.78 to be biased upward then both matching and regression have significantly reduced it, and depending on the application the difference between .67 and .75 may not be of great consequence. Of course if we run the regression including only matched variables, we get exactly the same results. (see R code below). This is not so different than the method of trimming based on propensity scores suggested in Angrist and Pischke.


Both methods rely on the same assumptions for identification, so noone can argue superiority of one method over the other with regard to identification of causal effects.

Matching has the advantage of having a nonparametric, alleviating concerns with functional form. However, there are lots of considerations to work through in matching (i.e. 1:1, 1:many, optimal caliper width, variance/bias tradeoff and kernel selection etc.). While all of these possibilities might lead to better estimates, I wonder if they don't sometimes lead to a garden of forking paths.

See also: 

For a neater set of notes related to this post, see:

Matt Bogard. "Regression and Matching (3).pdf" Econometrics, Statistics, Financial Data Modeling (2017). Available at: http://works.bepress.com/matt_bogard/37/

Using R MatchIt for Propensity Score Matching

R Code:

# generate demo data
x <- c(4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9)
d <- c(1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0)
y <- c(6,7,8,8,9,11,12,13,14,2,3,4,5,6,7,8,9,10)

summary(lm(y~x+d)) # regression controlling for x

Wednesday, July 12, 2017

Instrumental Variables and LATE

Often in program evaluation we are interested in estimating the average treatment effect (ATE).  This is in theory the effect of treatment on a randomly selected person from the population. This can be estimated in the context of a randomized controlled trial (RCT) by a comparison of means between treated and untreated participants.

However, sometimes in a randomized experiment, some members selected for treatment may not actually receive treatment (if participation is voluntary, for example the Medicaid expansion in Oregon). In this case, sometimes researchers will compare differences in outcome between those selected for treatment vs those assigned to control groups. This analysis, as assigned or as randomized, is referred to as an intent-to-treat analysis (ITT). With perfect compliance, ITT = ATE.

As discussed previously, using treatment assignment as an instrumental variable  (IV) is another approach to estimating treatment effects. This is referred to as a local average treatment effect (LATE).

What is LATE and how does it give us an unbiased estimate of causal effects?

In simplest terms, LATE is the ATE for the sub-population of compliers in an RCT (or other natural experiment where an instrument is used).

In a randomized controlled trial you can characterize participants as follows: (see this reference from egap.org for a really great primer on this)

Never Takers: those that refuse treatment regardless of treatment/control assignment.

Always Takers: those that get the treatment even if they are assigned to the control group.

Defiers: Those that get the treatment when assigned to the control group and do not receive treatment when assigned to the treatment group. (these people violate an IV assumption referred to monotonicity)

Compliers: those that comply or receive treatment if assigned to a treatment group but do not recieve treatment when assigned to control group.

The outcome for never takers is the same regardless of treatment assignment and in effect cancel out in an IV analysis. As discussed by Angrist and Pishke in Mastering Metrics, the always takers are prime suspects for creating bias in non-compliance scenarios. These folks are typically the more motivated participants and likely would have higher potential outcomes or potentially have a greater benefit from treatment than other participants.  The compliers are characterized as participants that receive treatment only as a result of random assignment. The estimated treatment effect for these folks is often very desirable and in an IV framework can give us an unbiased causal estimate of the treatment effect. This is what is referred to as a local average treatment effect or LATE.

How do we estimate LATE with IVs?

One way LATE estimates are often described is as dividing the ITT effect by the share of compliers. This can also be done in a regression context. Let D be an indicator equal to 1 if treatment is received vs. 0, and let Z be our indicator (0,1) for the original randomization i.e. our instrumental variable. We first regress:

D = β0 + β1 Z + e  

This captures all of the variation in our treatment that is related to our instrument Z, or random assignment. This is 'quasi-experimental' variation. It is also an estimate of the rate of compliance. β1 only picks up the variation in treatment D that is related to Z and leaves all of the variation and unobservable factors related to self selection (i.e. bias) in the residual term.  You can think of this as the filtering process.  We can represent this as: COV(D,Z)/V(Z). 

Then, to relate changes in Z to changes in our target Y we estimate β2  or COV(Y,Z)/V(Z).

Y = β02 Z + e        
Our instrumental variable estimator then becomes:
βIV = β2 / β1  or (Z’Z)-1Z’Y / (Z’Z)-1Z’D or COV(Y,Z)/COV(D,Z)  

The last term gives us the total proportion of ‘quasi-experimental variation’ in D related to Y. We can also view this through a 2SLS modeling strategy:


Stage 1: Regress D on Z to get D* or D = β0 + β1 Z + e 

Stage 2: Regress Y on D*  or  Y = β0IV D* + e 

 As described in Mostly Harmless Econometrics, "Intuitively, conditional on covariates, 2SLS retains only the variation in s [D  in our example above] that is generated by quasi-experimental variation- that is generated by the instrument z"

Regardless of how you want to interpret βIV, we can see that it teases out only that variation in  our treatment D that is unrelated to selection bias and relates it to Y giving us an estimate for the treatment effect of D that is less biased.

The causal path can be represented as:

Z →D→Y   

There are lots of other ways to think about how to interpret IVs. Ultimately they provide us with an estiamate of the LATE which can be interpreted as an average causal effect of treatment for those participants in a study whose enrollment status is determined completely by Z (the treatment assignment) i.e. the compliers and this is often a very relevant effect of interest. 

Marc Bellemare has some really good posts related to this see here, here, and here.


Tuesday, July 11, 2017

The Credibility Revolution in Econometrics

Previously I wrote about how graduate training (and experience) can provide a foundation for understanding statistics, experimental design, and interpretation of research. I think this is common across many master's and doctoral level programs. But some programs approach this a little differently than others. Because of the credibility revolution in economics, there is a special concern for identification and robustness. And even within the discipline, there is concern that this has not been given enough emphasis in modern textbooks and curricula (see here and here). However, this may not be well understood or appreciated by those outside the discipline.

What is the credibility revolution and what does it mean in terms of how we do research?

I like to look at this through the lens of applied economists working in the field:

Economist Jayson Lusk puts it well:

"Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution.  If you attend a research seminar in virtually any economist department these days, you're almost certain to hear questions like, "what is your identification strategy?" or "how did you deal with endogeneity or selection?"  In short, the question is: how do we know the effects you're reporting are causal effects and not just correlations."

Healthcare Economist Austin Frakt has a similar take:

"A “research design” is a characterization of the logic that connects the data to the causal inferences the researcher asserts they support. It is essentially an argument as to why someone ought to believe the results. It addresses all reasonable concerns pertaining to such issues as selection bias, reverse causation, and omitted variables bias. In the case of a randomized controlled trial with no significant contamination of or attrition from treatment or control group there is little room for doubt about the causal effects of treatment so there’s hardly any argument necessary. But in the case of a natural experiment or an observational study causal inferences must be supported with substantial justification of how they are identified. Essentially one must explain how a random experiment effectively exists where no one explicitly created one."

 How do we get substantial justification? Angrist and Pischke give a good example in their text Mostly Harmless Econometrics in their discussion of fixed effects and lagged dependent variables:

"One answer, as always is to check the robustness of your findings using alternative identifying assumptions. That means you would like to find broadly similar results using plausible alternative models." 

To someone trained in the physical or experimental sciences, this might 'appear' to look like data mining. But Marc Bellemare makes a strong case that it is not!

"Unlike experimental data, which often allow for a simple comparison of means between treatment and control groups, observational data require one to slice the data in many different ways to make sure that a given finding is not spurious, and that the researchers have not cherry-picked their findings and reported the one specification in which what they wanted to find turned out to be there. As such, all those tables of robustness checks are there to do the exact opposite of data mining."

That's what the credibility revolution is all about.

See also: 

Do Both! (by Marc Bellemare)
Applied Econometrics
Econometrics, Multiple Testing, and Researcher Degrees of Freedom








Monday, July 10, 2017

The Value of Graduate Education....and Experience

Some time ago I wrote a piece titled "Why Study Agricultural and Applied Economics." While this was somewhat geared toward graduate study, degrees in these areas provide a great combination of quantitative and analytical skills at the undergraduate level suitable for a number of roles in industry, especially when combined with programming like R, SAS, or Python. (just think Nate Silver). Another example would be the number of financial analysts and risk management and modeling roles held by graduates holding bachelor's degrees in economics and finance or related fields. Not everyone needs to be a PhD holding rocket scientist to do complex analytical work in applied fields.

However, what are some arguments for graduate study? I bring this up because sometimes I wonder, given my role in the private sector could I have had a similar trajectory if I just skipped the time, money and energy spent in graduate school and went straight to writing code?

Perhaps. But recently I was listening to a Talking Biotech podcast with Kevin Folta discussing the movie Food Evolution. Toward the end they discussed some critiques of the film, and a common critique about research in general is bias due to conflicts of interest. Kevin States:

"I've trained for 30 years to be able to understand statistics and experimental design and interpretation...I'll decide based on the quality of the data and the experimental design....that's what we do."

Besides taking on the criticisms of science, this emphasized two important points.

1) Graduate study teaches you to understand statistics and experimental design and interpretation. At the undergraduate level I learned some basics that were quite useful in terms of empirical work. In graduate school I learned what is analogous to a new language. The additional properties of estimators, proofs, and theorems taught in graduate statistics courses suddenly made the things I learned before make better sense. This background helped me to translate and interpret other people's work and learn from it, and learn new methodologies or extend others. But it was the seminars and applied research that made it come to life. Learning to 'do science' through statistics and experimental design. And interpretation as Kevin says.

2) Graduate study is an extendable framework. Learning and doing statistics is a career long process. This recognizes the gulf between textbook and applied econometrics.


Sunday, June 11, 2017

Instrumental Variables vs. Intent to Treat

 "ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization. ITT analysis is usually described as “once randomized, always analyzed”.

"ITT analysis avoids overoptimistic estimates of the efficacy of an intervention resulting from the removal of non-compliers by accepting that noncompliance and protocol deviations are likely to occur in actual clinical practice" 
- Gupta, 2011

 In Mastering Metrics, Angrist and Pischke describe intent-to-treat analysis:

"In randomized trials with imperfect compliance, when treatment assignment differs from treatment delivered, effects of random assignment...are called intention-to-treat (ITT) effects. An ITT analysis captures the causal effect of being assigned to treatment."

While treatment assignment is random, non-compliance is not! Therefore if instead of using intent to treat comparisons we compared those actually treated to those untreated we would get biased results, because this is essentially making uncontrolled comparisons between treated and untreated subjects.

Angrist and Pishke describe how instrumental variables can be used in this context:

 “The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occurring or generated by researchers"

 "Instrumental variable methods allow us to capture the causal effect of treatment on the treated in spite of the nonrandom compliance decisions made by participants in experiments....Use of randomly assigned intent to treat as an instrumental variable for treatment delivered eliminates this source of selection bias."

In  Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials there is a nice discussion of ITT and IV methods with applications related to clinical research.  Below is a nice treatment of IV in this context:

“Instrumental variables are assumed to emulate randomization variables, unrelated to unmeasured confounders influencing the outcome. In the case of randomized trials, the same randomized treatment assignment variable used in defining treatment groups in the ITT analysis is instead used as the instrumental variable in IV analyses. In particular, the instrumental variable is used to obtain for each patient a predicted probability of receiving the experimental treatment. Under the assumptions of the IV approach, these predicted probabilities of receipt of treatment are unrelated to unmeasured confounders in contrast to the vulnerability of the actually observed receipt of treatment to hidden bias. Therefore, these predicted treatment probabilities replace the observed receipt of treatment or treatment adherence in the AT model to yield an estimate of the as-received treatment effect protected against hidden bias when all of the IV assumptions hold.”

A great example of IV and ITT applied to health care can be found in Finkelstein et. al. (2013 & 2014) - See the Oregon Medicaid Experiment, Applied Econometics, and Causal Inference.

Over at the Incidental Economist, there was a nice discussion of ITT in the context of medical research that does a good job of explaining the rationale as well as when departures from ITT make more sense (such as safety and non-inferiority trials).

See also:  
Instrumental Explanations of Instrumental Variables

A Toy IV Application

Other IV Related Posts

References: 

Mastering ’Metrics:
The Path from Cause to Effect
Joshua D. Angrist & Jörn-Steffen Pischke
2015

Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112. http://doi.org/10.4103/2229-3485.83221

Ten Have, T. R., Normand, S.-L. T., Marcus, S. M., Brown, C. H., Lavori, P., & Duan, N. (2008). Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials. Psychiatric Annals, 38(12), 772–783. http://doi.org/10.3928/00485713-20081201-10

"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722. http://www.nejm.org/doi/full/10.1056/NEJMsa1212321

Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183] 

Detry MA, Lewis RJ. The Intention-to-Treat PrincipleHow to Assess the True Effect of Choosing a Medical Treatment. JAMA. 2014;312(1):85-86. doi:10.1001/jama.2014.7523


Tuesday, June 6, 2017

Professional Science Master's Degree Programs in Biotechnology and Management

As an undergraduate I always had an interest in biotechnology and molecular genetics. However, I did not have a strong science background from high school, and lab work did not particularly appeal to me. I also recognized early on that science does not occur in a vacuumn- its subject to social, political, economic, and financial forces. This drew me to the field of economics, specifically public choice theory.

When it came time for graduate school I was still torn. I really wasn't interested in an MBA and despite minoring in mathematics I soon discovered that a background lacking in topology or real analysis made a PhD in Economics a long shot.  However, I really liked economics. The combination of mathematically precise theories (microeconomics/game theory) and empirically sound methods (econometrics) provided a powerful framework for applied problem solving. And I still had an interest in genetics.

I had two advisers make recommendations that got me thinking outside the box. One suggested ultimately I would find a niche that combined both economics and genetics. The other suggested I look at programs like the Bioscience Management program that was being offered at the time at George Mason University. Ultimately, that is the direction I went.  While there were not a lot of programs like that being offered at the time, the Agriculture Department at Western Kentucky University provided enough flexibility in their masters program to structure a curriculum with an emphasis in Bioscience Economics. In this program I completed course work in biostatistics, genetics, and applied economics. I was able to work on research projects analyzing consumer perceptions of biotechnology and biotech trait resistance management using tools from econometrics, game theory, and population genetics.  Additionally I took courses in applied economics and finance from both the Department of Agriculture and College of Business where I was exposed to tools related to investment analysis, options pricing, and analysis and valuation of biotech companies as well as the impacts of technological change and biotechnology on food and economic development.

With this  combination of quantitative training and applied work I have been able to leverage SAS, R, and Python to solve a number of challenging problems throughout a number of professional analytics roles. 

I have also noticed a larger number of professional science masters which seem very similar to the program I completed over 10 years ago. 

According to National Professional Science Master’s Association:

"Professional Science Master's (PSMs) are designed for students who are seeking a graduate degree in science or mathematics and understand the need for developing workplace skills valued by top employers. A perfect fit for professionals because it allows you to pursue advanced training and excel in science or math without a Ph.D., while simultaneously developing highly-valued business skills....PSM programs consist of two years of coursework along with a professional component that includes business, communications and/or regulatory affairs."

In 2012 there was an article in Science detailing these degrees and some data related to salaries which seemed attractive. According to the article the first program was officially offered in 1997, reaching 140 programs by 2009 with over 247 at the time of printing.

This commentary from the article corroborates how I feel about my experience:

“There is a tendency for students to buy into the line that if you don't get a Ph.D., you're not a serious professional, that you're wasting your mind,” she says. After spending a decade talking with PSM students and graduates, she is certain that’s not true. “There is so much potential for growth and satisfaction with a PSM degree. You can become a person you didn’t even know you wanted to be.”

Below are some programs that would look interesting to me that students interested in this option should check out.  (there is a program locator you can find here) . Similar to my master's, many of these programs are a mash up of biology/biotech and applied economics and business degrees.

George Mason University- PSM Bioinformatics Management

University of Illinois - Agricultural Production

Cornell- MPS Agriculture and Life Sciences

Washington State University - PSM Molecular Biosciences

Middle Tennesee State University - PSM Biotechnology

California State - MS Biotechnology/MBA 

Johns Hopkins - MBA/MS Biotechnology

Rice - PSM Bioscience and Health Policy

North Carolina State University - MBA (Biosciences Mgt Concentration)

Purdue/Kelley - MS-MBA  (not a heavy science emphasis but a very cool degree regardles from great schools)

See also:
Analytical Translators
Why Study Agricultural/Applied Economics