Sunday, June 11, 2017

Instrumental Variables vs. Intent to Treat

 "ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization.ITT analysis is usually described as “once randomized, always analyzed”.

"ITT analysis avoids overoptimistic estimates of the efficacy of an intervention resulting from the removal of non-compliers by accepting that noncompliance and protocol deviations are likely to occur in actual clinical practice" 
- Gupta, 2011

 In Mastering Metrics, Angrist and Pischke describe intent-to-treat analysis:

"In randomized trials with imperfect compliance, when treatment assignment differs from treatment delivered, effects of random assignment...are called intention-to-treat (ITT) effects. An ITT analysis captures the causal effect of being assigned to treatment."

While treatment assignment is random, non-compliance is not! They point out that the issue of non-compliance in ITT creates selection bias. However this can be handled:

"ITT effects divided by the difference in compliance rates between treatment and control groups capture the causal effect"

Say what? Well how does that work?

Lets look at this. Suppose we have a randomized trial or treatment with outcome Y, where Z  = 1 if assigned to a treatment group and 0 if assigned to control. An intent to treat (omitting controls) estimate could be estimated with the following regression:

Y = b0 + b1*Z + e (1)  ITT or 'reduced form'

where b1 = COV(Y, Z) / V(Z)

If we let D = 1 for those in the study that actually received treatment i.e. compilers and  D = 0 indicate non-treated or non-compliers, then the difference in compliance rates between treatment and control groups can be estimated as:

D = b0 + b2*Z + e (2) '1st stage'

where b2 = COV(D, Z) / V(Z)

It turns out then, as suggested by Angrist and Pishke, dividing our ITT effects by the difference in compliance rates is precisely the ratio of reduced form to first stage estimates. Mathematically this is  an instrumental variables framework.

 b(IV) = b1/b2 = COV(Y, Z) / COV(D, Z)

 The random assignment, or intent-to-treat flag 'Z' becomes our instrumental variable for treatment delivered  or D. Angrist and Pishke describe IVs this way:

 “The instrumental variables (IV) method harnesses partial or incomplete random assignment, whether naturally occurring or generated by researchers"

 This is a powerful method of eliminating selection bias:

"Use of randomly assigned intention to treat as an instrumental variable for treatment delivered eliminates this source of selection bias."

(For more information and some toy examples showing how this works, see the links below)

In  Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials there is a nice discussion of ITT and IV methods with applications related to clinical research.  Below is a nice treatment of IV in this context:

“Instrumental variables are assumed to emulate randomization variables, unrelated to unmeasured confounders influencing the outcome. In the case of randomized trials, the same randomized treatment assignment variable used in defining treatment groups in the ITT analysis is instead used as the instrumental variable in IV analyses. In particular, the instrumental variable is used to obtain for each patient a predicted probability of receiving the experimental treatment. Under the assumptions of the IV approach, these predicted probabilities of receipt of treatment are unrelated to unmeasured confounders in contrast to the vulnerability of the actually observed receipt of treatment to hidden bias. Therefore, these predicted treatment probabilities replace the observed receipt of treatment or treatment adherence in the AT model to yield an estimate of the as-received treatment effect protected against hidden bias when all of the IV assumptions hold.”

A great example of IV and ITT applied to health care can be found in Finkelstein et. al. (2013 & 2014) - See the Oregon Medicaid Experiment, Applied Econometics, and Causal Inference.

Over at the Incidental Economist, there was a nice discussion of ITT in the context of medical research that does a good job of explaining the rationale as well as when departures from ITT make more sense (such as safety and non-inferiority trials).

The regression algebra above can be informative. For example, if compliance were perfect, a simple comparison between treatment and controls as indicated by the treatment indicator Z would yield unbiased treatment effects.

Y = b0 + b1*Z + e

This is simply the ITT estimate where b1 = COV(Y, Z) / V(Z), which is  an unbiased estimate of treatment effects when there is no selection bias

With perfect compliance, the IV will collapse to give us the same result as an ITT estimate.  In this case D = Z and the regression

D = b0 + b2*Z + e 

will be an identity, b2  or COV(D, Z) / V(Z)   =  1

so the IV estimator gives us: b1/1 = b1 which is our ITT estimate.

With imperfect compliance, the denominator departs from 1, allowing us to adjust our ITT estimate in a way that removes selection bias related to unobservables.

See also:  

Instrumental Explanations of Instrumental Variables

A Toy IV Application

Other IV Related Posts


Mastering ’Metrics:
The Path from Cause to Effect
Joshua D. Angrist & Jörn-Steffen Pischke

Gupta, S. K. (2011). Intention-to-treat concept: A review. Perspectives in Clinical Research, 2(3), 109–112.

Ten Have, T. R., Normand, S.-L. T., Marcus, S. M., Brown, C. H., Lavori, P., & Duan, N. (2008). Intent-to-Treat vs. Non-Intent-to-Treat Analyses under Treatment Non-Adherence in Mental Health Randomized Trials. Psychiatric Annals, 38(12), 772–783.

"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722.

Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183] 

Detry MA, Lewis RJ. The Intention-to-Treat PrincipleHow to Assess the True Effect of Choosing a Medical Treatment. JAMA. 2014;312(1):85-86. doi:10.1001/jama.2014.7523

Tuesday, June 6, 2017

Professional Science Master's Degree Programs in Biotechnology and Management

As an undergraduate I always had an interest in biotechnology and molecular genetics. However, I did not have a strong science background from high school, and lab work did not particularly appeal to me. I also recognized early on that science does not occur in a vacuumn- its subject to social, political, economic, and financial forces. This drew me to the field of economics, specifically public choice theory.

When it came time for graduate school I was still torn. I really wasn't interested in an MBA and despite minoring in mathematics I soon discovered that a background lacking in topology or real analysis made a PhD in Economics a long shot.  However, I really liked economics. The combination of mathematically precise theories (microeconomics/game theory) and empirically sound methods (econometrics) provided a powerful framework for applied problem solving. And I still had an interest in genetics.

I had two advisers make recommendations that got me thinking outside the box. One suggested ultimately I would find a niche that combined both economics and genetics. The other suggested I look at programs like the Bioscience Management program that was being offered at the time at George Mason University. Ultimately, that is the direction I went.  While there were not a lot of programs like that being offered at the time, the Agriculture Department at Western Kentucky University provided enough flexibility in their masters program to structure a curriculum with an emphasis in Bioscience Economics. In this program I completed course work in biostatistics, genetics, and applied economics. I was able to work on research projects analyzing consumer perceptions of biotechnology and biotech trait resistance management using tools from econometrics, game theory, and population genetics.  Additionally I took courses in applied economics and finance from both the Department of Agriculture and College of Business where I was exposed to tools related to investment analysis, options pricing, and analysis and valuation of biotech companies as well as the impacts of technological change and biotechnology on food and economic development.

With this  combination of quantitative training and applied work I have been able to leverage SAS, R, and Python to solve a number of challenging problems throughout a number of professional analytics roles. 

I have also noticed a larger number of professional science masters which seem very similar to the program I completed over 10 years ago. 

According to National Professional Science Master’s Association:

"Professional Science Master's (PSMs) are designed for students who are seeking a graduate degree in science or mathematics and understand the need for developing workplace skills valued by top employers. A perfect fit for professionals because it allows you to pursue advanced training and excel in science or math without a Ph.D., while simultaneously developing highly-valued business skills....PSM programs consist of two years of coursework along with a professional component that includes business, communications and/or regulatory affairs."

In 2012 there was an article in Science detailing these degrees and some data related to salaries which seemed attractive. According to the article the first program was officially offered in 1997, reaching 140 programs by 2009 with over 247 at the time of printing.

This commentary from the article corroborates how I feel about my experience:

“There is a tendency for students to buy into the line that if you don't get a Ph.D., you're not a serious professional, that you're wasting your mind,” she says. After spending a decade talking with PSM students and graduates, she is certain that’s not true. “There is so much potential for growth and satisfaction with a PSM degree. You can become a person you didn’t even know you wanted to be.”

Below are some programs that would look interesting to me that students interested in this option should check out.  (there is a program locator you can find here) . Similar to my master's, many of these programs are a mash up of biology/biotech and applied economics and business degrees.

George Mason University- PSM Bioinformatics Management

University of Illinois - Agricultural Production

Cornell- MPS Agriculture and Life Sciences

Washington State University - PSM Molecular Biosciences

Middle Tennesee State University - PSM Biotechnology

California State - MS Biotechnology/MBA 

Johns Hopkins - MBA/MS Biotechnology

Rice - PSM Bioscience and Health Policy

North Carolina State University - MBA (Biosciences Mgt Concentration)

Purdue/Kelley - MS-MBA  (not a heavy science emphasis but a very cool degree regardles from great schools)

See also:
Analytical Translators
Why Study Agricultural/Applied Economics

Monday, June 5, 2017

Game Theory with Python- TalkPython Podcast

Episode 104 of the TalkPython podcast discussed game theory.

Here are a few slices:

"Our guests this week, Vince Knight, Marc Harper, and Owen Campbell are here to discuss their Python project built to study and simulate one of the central problems in game theory, "The Prisoner's Dilemma"

"Yeah, so one of the things is how people end up cooperating. If we're all incentivized not to cooperate with each other yet we look around, we see all these situations where people are cooperating, so can we devise strategies that when we play this game repeatedly that coerce or convince our partners that they're better off cooperating with us than defecting against us......Okay, excellent. Give us a sense for some of the, you have some clever names for the different strategies or players, right? Strategy and player is kind of the same thing. You've got the basic ones. The cooperator and the defector, but what else?Probably the most famous one is the tit for tat strategy. Because in Axelrod's original tournament, one of the interesting results that came out with his work was that this strategy was one of the most successful."

And then they get into incorporating machine learning:

"We've extended that method of taking a strategy based on some kind of machine learning algorithm, training it against the other strategies and then adding the fact of the tournaments to see about those. Right now, those are amongst the best players in the library, in terms of performance."

See my previous post for some concepts and examples from game theory that were discussed in this podcast. You can find more references from this podcast including papers, code etc. here.

Game Theory- A Basic Introduction

When someone else’s choices impact you, it helps to have some way to anticipate their behavior. Game Theory provides the tools for doing so (Nicholson, 2002). Game Theory is a mathematical technique developed to study choice under conditions of strategic interaction (Zupan, 1998). It allows for the analysis of interdependent situations.

In game theory, a game is a decision-making situation with interdependent behavior between two or more individuals (Harris,1999). The individuals involved in making the decisions are the players. The set of possible choices made by the players are strategies. The outcomes of choices and strategies played are payoffs. Payoffs are often stated as levels of utility, income, profits, or some other stated objective particular to the game. A general assumption in game theory is that players seek the highest payoff attainable, preferring more utility to less (Nicholson, 2002). 

When a decision maker takes into account how other players will respond to his choices, a utility maximizing strategy may be found. It may allow one to predict in advance the actions, responses, and counter responses of others and then choose optimal strategies (Harris, 1999). Such optimal strategies that leave players with no incentive to change their behavior are equilibrium strategies

Games can be characterized by players, strategies, and payoffs. Below is one way to visualize a game.

Example: Overgrazing Game

                                                    RANCHER 2:
                                               Conserve   Overgraze
RANCHER 1:    Conserve     (20, 20)    |  (0, 30)
                           Overgraze    (30, 0)      |  (10, 10)

In this game, the players are rancher '1' and rancher '2'.  They can play one of two strategies, to conserve or overgraze a commonly shared or 'public' pasture. Suppose rancher 1 chooses a strategy (picks a row). Their payoff is depicted by the first number in each cell. Rancher 2 will choose a strategy in return (picking a column). Rancher 2’s payoff is indicated by the second number in each cell. 

In this case, the best strategy for rancher 2 (no matter what rancher 1 chooses to do) is to overgraze because the payoff  for rancher 2 (the 2nd number in each cell) associated with overgrazing is always the highest. Likewise, no matter what rancher 2 chooses to do, the best strategy for rancher 1 is to overgraze because the first number in each cell (the payoffs for rancher 1) associated with overgrazing is always the highest. Both players have a dominant strategy to overgraze This represents an equilibrium strategy of {overgraze, overgraze}. 

This outcome is also described as a prisoner’s dilemma or a Nash Equilibrium. In a Nash equilibrium each player’s choice is the best choice possible taking into consideration the choice of the other players (Zupan, 1998). This concept was generalized by the mathematician John Nash in 1951 in his paper “Equilibrium Points in n-Person Games.” 

It’s easy to see that if the players would conserve, they could both be made better off because the strategy {conserve, conserve} yields payoffs (20,20) which are much higher than the Nash Equilibrium strategy’s payoff of (10,10). 

Just as competitive market forces elicit cooperation by coordinating behavior through price mechanisms, so too must players in a game find some means of coordinating their behavior if they wish to escape the sub-optimal Nash Equilibrium.  

Some Additional Concepts  

Multiple Period Games- Multiple period games are games that are played more than once, or more than one time period. If we could imagine playing the prisoner’s dilemma game multiple times we would have a multi- period game. If games are played perpetually they are referred to infinite games
(Harris, 1999).  

Punishment Schemes - Punishment schemes are used to elicit cooperation or enforcement of agreements. 

In the game presented above, suppose both players wanted to cooperate to conserve grazing resources. If it turned out that rancher 2 cheated, then in the next period rancher 1 would refuse to cooperate. If the game is played repeatedly, rancher 2 would learn that if he sticks to the deal both players would be better off. In this way punishment schemes in multi-period games can elicit cooperation, allowing an escape from a Nash Equilibrium. This may not be possible in the single period games that we looked at before.

Tit-for-Tat - Tit-for-tat punishment mechanisms are schemes in which if one player fails to cooperate, the other player will refuse to cooperate in the next period. 

Trigger Strategy - In infinitely repeated games a trigger strategy involves a promise to play the optimal strategy as long as the other players comply (Nicholson, 2002).  

Grim Trigger Strategy - This is a trigger strategy that involves punishment for many periods if the other player does not cooperate. In other words if one player defects when he should cooperate, the other player(s) will not offer the chance to cooperate again for a long time. As a result both players will be confined to a N.E. for many periods or perpetually (Harris, 1999).  

Trembling Hand Trigger Strategy- This is a trigger strategy that allows for mistakes. Suppose in the first instance player 1 does not realize that player 2 is willing to cooperate. Instead of player 1 resorting to a long period of punishment as in the grim trigger strategy, player 1 allows player 2 a second chance to cooperate. It may be the case that instead of playing the grim trigger strategy, player 1 may invoke a single period tit-for-tat punishment scheme in hopes to elicit cooperation in later periods. 

Folk Theorems - Folk theorems result from the conclusion that players can escape the outcome of a Nash Equilibrium if games are played repeatedly, or are infinite period games (Nicholson,2002).
 In general, folk theorems state that players will find it in their best interest to maintain trigger strategies in infinitely repeated games. 

See also:
Matt Bogard. "An Econometric and Game Theoretic Analysis of Producer and Consumer Preferences Toward Agricultural Biotechnology" Western Kentucky University (2005) Available at:

Matt Bogard. "An Introduction to Game Theory: Applications in Environmental Economics and Public Choice with Mathematical Appendix" (2012) Available at:   

Matt Bogard. "Game Theory, A Foundation for Agricultural Economics" (2004) Available at:  


Nicholson, Walter R. “Microeconomic Theory: Basic Principles and Extensions.” Southwestern Thomson Learning. U.S.A. (2002).

Browning, Edward K. and Mark A. Zupan. “Microeconomic Theory and Applications.” 6th Edition. Addison-Wesley Longman Inc. Reading, MA. (1999)

Harris, Frederick H. et al. “Managerial Economics: Applications, Strategy, and Tactics.” Southwestern College Publishing. Cincinnati, OH. (1999).

Saturday, June 3, 2017

In Praise of The Citizen Data Scientist

There was actually a really good article I read over at Data Science Central titled "The Data Science Delusion." Here is an interesting slice:

"This democratization of algorithms and platforms, paradoxically, has a downside: the signaling properties of such skills have more or less been lost. Where earlier you needed to read and understand a technical paper or a book to implement a model, now you can just use an off-the-shelf model as a black-box. While this phenomenon affects many disciplines, the vague and multidisciplinary definition of data science certainly exacerbates the problem."

It is true there is some loss of signal. However, companies may need to look for new signals as technological change progresses and new forms of capital complements labor. Its this new labor complementing role of capital (in the form of open source statistical computing packages and computing power) that is creating demand for those that can leverage these tools competently, without knowing all  "the nitty-gritty mathematical academic formulas to everything about support vector machines or Kernels and stuff like that to apply it properly and get results."

Sure, as a result there are a lot of analytics programs popping up out there to take advantage of these advances, but its also the reason programs like applied economics are becoming so popular.  In fact, in promoting its program, Johns Hopkins University almost seems to echo some of the sentiment in the quotes above, but takes a positive spin:

"Economic analysis is no longer relegated to academicians and a small number of PhD-trained specialists. Instead, economics has become an increasingly ubiquitous as well as rapidly changing line of inquiry that requires people who are skilled in analyzing and interpreting economic data, and then using it to effect decisions about national and global markets and policy, involving everything from health care to fiscal policy, from foreign aid to the environment, and from financial risk to real risk." 

In fact, I admit for a while I was a little disappointed my alma mater did not embrace the data science/analytics degree trend, or offer more courses in applied programming or incorporate languages like R into more courses. However, now, while I think these things are great I realize the more important data science skills are related to the analytical thinking and firm theoretical, statistical, and quantitative foundations that programs in economics and finance already offer at the undergraduate and masters level. While formal data science training might be the way of the future, I would venture to say that the vast majority of today's 'data scientists' were academically trained in a quantitative discipline like the above and self trained (perhaps via coursera etc.) on the skills and tools most people think of when they think of data science.  As I have said before, sometimes you don't need someone with a PhD in computer science or an astrophysics. Sometimes you really just need a good MBA that understands regression and the basics of a left join.

The DSC article above concludes with a little jab at data science, that I tend to agree with wholeheartedly:

"Great data science work is being done in various places by people who go by other names (analyst, software engineer, product head, or just plain old scientist). It is not necessary to be a card-carrying data scientist to do good data science work. Blasphemy it may be to say so, but only time will tell whether the label itself has value, or is only helping create a delusion." 

See also:

What you really need to know to be a data scientist
Super Data Science podcast - credit scoring
How to think like a data scientist to become one
What makes a great data scientist
Are data scientists going extinct
More on data science from actual data scientists

Tuesday, May 30, 2017

Multicollinearity.....just a bad joke?


"The worth of an econometrics textbook tends to be inversely related to the technical material devoted to  multicollinearity" - Williams, R. Economic Record 68, 80-1. (1992).  via Kennedy, A Guide to Econometrics (6th edition).

If you have never read Arthur S. Goldberger's treatment of multicollinearity in his well known text A Course in Econometrics you are missing some of the best reading in econometrics you will ever find. A few years ago Dave Giles gave a nice preview here:

Basically, Goldberger provides a good length discussion in his textbook about 'micronumerosity,' a term he makes up to parody multicollinearity and the excessive amount of attention it is given in textbooks and resources spent by practitioners attempting to 'detect' it (see Dave Giles post). Its more entertaining than the meme I found above.

For a quick review, multicollinearity can be characterized in multivariable regression as a situation where there is correlation between explanatory variables. For instance if we are estimating:

 y = b0 + b1x1 + b2x2 + b3x3 + e

and x2 and x3 are highly correlated,  the amount of independent variation in each variable is reduced. With less information available to estimate the effects b2 and b3, these estimates become less precise and their standard errors may be larger than otherwise.

As Goldberger advises, we should not spend a lot of resources trying to apply various 'tests' for multicollinearity, but focus more on if its consequences really matter:

"Researchers should not be concerned with whether or not there really is collinearity. They may well be concerned with whether the variances of the coefficient estimates are too large-for whatever reason-to provide useful estimates of the regression coefficients" (Goldberger, 1991).

Below are some other posts I have previously written on the topic, addressing multicollinearity in the context of predictive vs inferential modeling etc.

From my discussion of multicollinearity in Linear Literalism and Fundamentalist Econometrics:

"Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-​​region of the predictors used when estimating the model."-  Statist. Sci.  Volume 25, Number 3 (2010), 289-310.

See also:

Paul Allison on Multicollinearity - when not to worry

Ridge Regression

Monday, April 10, 2017

More on Data Science from Actual Data Scientists

Previously I wrote a post titled: What do you really need to know to be a data scientist. Data science lovers and haters. In this post I made the general argument that this is a broad space and there is a lot of contention about the level of technical skill and tools that one must master to consider themselves a 'real' data scientist vs. getting labeled a 'fake' data scientist or 'poser' or whatever. But, to me its all about leveraging data to solve problems and most of that work is about cleaning and prepping data. It's process.  In an older KDNuggets article, economist/data scientist Scott Nicholson makes a similar point:

GP: What advice you have for aspiring data scientists?

SN: Focus less on algorithms and fancy technology & more on identifying questions, and extracting/cleaning/verifying data. People often ask me how to get started, and I usually recommend that they start with a question and follow through with the end-to-end process before they think about implementing state-of-the-art technology or algorithms. Grab some data, clean it, visualize it, and run a regression or some k-means before you do anything else. That basic set of skills surprisingly is something that a lot of people are just not good at but it is crucial.

GP: Your opinion on the hype around Big Data - how much is real?

SN: Overhyped. Big data is more of a sudden realization of all of the things that we can do with the data than it is about the data themselves. Of course also it is true that there is just more data accessible for analysis and that then starts a powerful and virtuous spiral. For most companies more data is a curse as they can barely figure out what to do with what they had in 2005.

So getting your foot in the door in a data science field doesn't mean mastering Hive or Hadoop apparently. And, this does not sound like PhD level rocket science at this point either. Karolis Urbonas, Head of Business Intelligence at Amazon has recently written a couple of similarly themed pieces also at KDNuggets:

How to think like a data scientist to become one

"I still think there’s too much chaos around the craft and much less clarity, especially for people thinking of switching careers. Don’t get me wrong – there are a lot of very complex branches of data science – like AI, robotics, computer vision, voice recognition etc. – which require very deep technical and mathematical expertise, and potentially a PhD… or two. But if you are interested in getting into a data science role that was called a business / data analyst just a few years ago – here are the four rules that have helped me get into and are still helping me survive in the data science."

He emphasizes basic data analysis, statistics, and coding to get started. The emphasis again is not on specific tools, degrees etc. but more on the process and ability to use data to solve problems. Note in the comments there is some push back on the level of expertise required, but Karolis actually addressed that when he mentioned very narrow and specific roles in AI, robotics, etc. Here he's giving advice for getting started in the broad diversity of roles in data science outside these narrow tracks. The issue is some people in data science want to narrow the scope to the exclusion of much of the work done by business analysts, researchers, engineers and consultants creating much of the value in this space (again see my previous post).

What makes a great data scientist?

"A data scientist is an umbrella term that describes people whose main responsibility is leveraging data to help other people (or machines) making more informed decisions….Over the years that I have worked with data and analytics I have found that this has almost nothing to do with technical skills. Yes, you read it right. Technical knowledge is a must-have if you want to get hired but that’s just the basic absolutely minimal requirement. The features that make one a great data scientist are mostly non-technical."

1. Great data scientist is obsessed with solving problems, not new tools.

"This one is so fundamental, it is hard to believe it’s so simple. Every occupation has this curse – people tend to focus on tools, processes or – more generally – emphasize the form over the content. A very good example is the on-going discussion whether R or Python is better for data science and which one will win the beauty contest. Or another one – frequentist vs. Bayesian statistics and why one will become obsolete. Or my favorite – SQL is dead, all data will be stored on NoSQL databases."