& Thales' Press

Tuesday, September 27, 2016

New Book: Business Intelligence with R by Dwight Berry

If you are new to data science and learning the R language, let me recommend this new gem of a book, Business Intelligence with R, by my friendr (the term I just coined to describe R users who help each other), Dwight Barry: https://leanpub.com/businessintelligencewithr

Business Intelligence with R serves as a great cookbook that can save you hours of frustration learning how to get the basics going. Even if you're an old pro, the book serves as a handy desk reference.

Also, please consider the personal note that Dwight sent to all of his beta readers:
Perhaps most importantly, I've also decided to give all proceeds to the Agape Girls Junior Guild, which is a group of middle-school girls who do fundraising for mitochondrial disorder research at Seattle Children's Research Institute and Seattle Children's Hospital. While the minimum price for this book will always be free, if you're the type who likes to "buy the author a coffee," know that your donation is supporting a better cause than my already out-of-control coffee habit. :-)
Business Intelligence with R serves a greater cause.

Wednesday, February 10, 2016

Becoming a Business Analytics Jedi: An application of values-framed decision making

I will be speaking at the Georgia Tech Scheller College of Business on February 18, 2016 on the following topic:
In the current rush to adopt data-driven analytics, discussions about algorithms, programming tools, and big data tend to dominate the practice of business analytics. But we are defined by our choices, our values, and preferences. Data and business analytics that do not start with this recognition actually fail to support the human-centered reason for decision making. This is the way of the Sith. A Jedi, however, knows that framing business analytics in terms of the values and preferences of decision makers, and the uncertainty of achieving those, employs the tools of decision and data science in the wisest way. In this discussion, we will think about the principles of high quality decisions, how to frame a business analytics problem, and learn how to use information in the most efficient way to create value and minimize risk.
The discussion will include a demonstration of the Analytica modeling software.

If you're in the Atlanta area, I would love for you to join me in the discussion.

A special thanks to Dr. Beverly Wright for organizing this event!

Interview with Atlanta Business Radio

Recently, Brian McCarthy and I had some fun being interviewed by Ryan McPherson of Atlanta Business Radio.

You can also listen to the interview here.

Monday, January 19, 2015

Teaching the Love of Thinking and Discovery

This post is going to be different from what I've published here before. I'm not going explain something or attempt to be clever. Instead, I want to share an idea, an open ended kind of idea for which, at this point, I have no conclusions. First, let me share some background.

The other day I shared a TED Talk by Conrad Wolfram ("Teaching kids real math with computers") as an update on LinkedIn and on my personal Facebook. Please take the time to listen to this if you have not already. I think this is actually vitally important to the well being of our children and how they gain an education.

My friend and colleague, James Mitchell, made the following comment on the original update: "A great talk. My daughter's life would have been so much easier and better with this approach to teaching math. Wolfram talked about all her complaints." They were my complaints, too. A few of the comments made on my Facebook page included "Math is hard" and "I hate math. I never use it." Apparently, the same complaints are shared by more than just two people.

Curiosity photo by Rosemary Ratcliff, provided courtesy of FreeDigitalPhotos.net

I've been thinking about this TED Talk almost non-stop since I watched it, and I'm beginning to think that one way to achieve the idea here is to provide mathematics education outside of traditional school environments. By that, I don't mean that we should advocate that schools quit teaching math; rather, I think we need to start providing private forums in which kids who are interested in math can learn math in the same way they might learn and participate in extracurricular sports or arts activities that are not offered in a traditional school. I'm currently convinced the program must be private and free from policy driven curricula that "teaches to the test" and arbitrary performance criteria. This is for fun, but a special kind of fun.

What if there were mathematics/programming academies that taught math this way? Maybe it would be a private academy for self-motivated kids who want to learn math, maybe offered after their normal school day or on the weekends. It would follow the approaches advocated by Conrad Wolfram, Paul Lockhart, and Kevin Devlin. It would not confer a degree, diploma, or certificate of any sort other than a letter that describes the areas of inquiry and completion of certain milestone projects that were self-selected by the student and mentored by the "professors." For older students, these projects might include publishing papers in journals as well as serve as the submissions to more traditional math and science fair projects. This would not be an after school tutoring program for students who want to improve their grades to passing levels or gain extra points on their college admission tests.

In other words, the immediate purpose of the school would only be to satisfy the natural curiosity of self-motivated students. I believe such an academy would eventually provide economic benefits to its students because it would teach both creative and structured thinking that the market would eventually reward, but the near term benefit would serve to remediate the destruction of natural curiosity created in our current systems and just simply help our youngest achieve what they want to achieve. I envision this as a kind of math zendo where children learn the art driven by intrinsic motivation and encouragement from like-mined but more mature leaders.

Of course, as ideas take hold in our minds, so do the doubts. I think the difficult aspect of this idea would be financing the program. Currently, I see the finances being provided in part by student fees, some voluntary time offered by teachers, and private donations. I would want to structure the student fees such that an interested student could not participate because they could not afford the fees.

Much remains to be considered here. Maybe this has been done before or is being done right now. I don't know. Regardless, I welcome any feedback you might offer.

Labels: ,

Wednesday, January 07, 2015

An Interesting Christmas Gift

Over the holidays, the New York Times delivered an unusual juxtaposition of headlines and content, and apparent lack of self-awareness, to illicit such a hearty chuckle from its readers as to make the cheerful Old Saint jealous.

[image originally provided by @ddmeyer on Twitter]

To those imbued with the skill of basic high school Algebra 1, the information in the article about Sony’s revenues for the first four days of release of “The Interview” were enough to solve a unit value problem. If we let R = the number of rentals, and S = the number of sales; then,
  • R + S = 2 million 
  • $6*R + $15*S = $15 million 
With a little quick symbolic manipulation, we see that S = 1/3 million in sales and R = 5/3 million in rentals. That exercise provided just enough mental stimulation and smug self-righteousness to prepare for the day’s sudoku and crossword puzzles. #smug #math

However, not too far into the sudoku puzzle we might realize that a deeper, more instructive problem exists here, a problem that actually permeates all of our daily lives. That problem is related to the precision of the information we have to deal with in planning exercises or, say, garnering market intelligence, etc. A second reading of the article reveals that the sales values, both the total transactions and the total value of them, were reported as approximations. In other words, if the sources at Sony followed some basic rules of rounding, the total number of transactions could range from 1.5 million to 2.4 million, and the total value might range from $14.5 million to $15.4 million. This might not seem like a problem at first consideration. After all, 2 million is in the middleish of its rounding range as is $15 million. Certainly the actual values determined by the simple algebra above point to a good enough approximate answer. Right? Right?

To see if this true, let’s reassign the formulas above in the following way.
  • R + S = T 
  • $6*R + $15*S = V 
where T = total transactions, and V = total value. Again, with some quick symbolic manipulation, we can get the exactly precise answers for R and T across a range of values for T and V.
  • S = 1/9 * V - 2/3 * T 
  • R = T - S 
Doing this we now notice something quite at odds with our intuition - the range of variation between the sales and rentals can be quite large as we see in this scatter plot:

[Fig. 1: The distribution of total transaction values for various combinations of rental and direct sales numbers.]

Here we see that the rental numbers could range from about 800 thousand to 2.4 million, while the direct sales could range from nearly 0 to 700 thousand! Maybe more instructive is to consider the range of the ratio of the rentals to direct sales:

[Fig. 2: The distribution of the ratio of rentals to direct sales for various combinations of rental and direct sales numbers.]

If we blithely assume that the reported values of sales were precise enough to support believing that the actual value of rentals and unit sales were close to our initial result, we could be astoundingly wrong. The range of this ratio could run from about 1.11 (for 1.5 million in total transactions; 15.4 million in sales) to 215 (for 2.4 million in total transactions; 14.5 million in sales). If we were trying to glean market intelligence from these numbers on which to base our own operational or marketing activities, we would face quite a conundrum. What’s the best estimate to use?
Fortunately, we can turn to probabilisitic reasoning to help us out. Let’s say we consult a subject matter expert (SME) who gives us a calibrated range and distribution for the sales assumptions such that the range of each distribution stays mostly within the rounding range we specify.

[Fig. 2a, b: The hypothetical distribution of the (a) total sales transactions and (b) total value assessed by our SME.]

Using the sample values underlying these distributions in our last set of formulas, we observe that in all likelihood - an 80th percentile likelihood – the actual ratio of the rentals to sales falls in a much narrower range – the range of 3 to 9, not 1.11 to 215.

[Fig. 3: The 80th percentile prediction interval for the ratio of the rentals to sales falls in the range of 3 to 9.]

Our manager may push back on this by saying that our SME doesn’t really have the credibility to use the distributions assessed above. She asks, "What if we stick with maximal uncertainty within the range?” In other words, what if, instead of assessing a central tendency around the reported values with declining tails on each side, we assume there is a uniform distribution along the range of sales values (i.e., each value is equally probable to all values in the range)?

[Fig. 4a, b: We replace our SME supplied distribution for (a) total sales transactions and (b) total value with one that admits an insufficient reason to suspect that any value in our range is more likely than any other.]

What is the result? Well, we see that even with the assumption of maximal uncertainty, while the most likely range expands by a factor of 2.7 (i.e., the range expanded from 3-9 to 1.7-18), it still remains within a manageable range as the extreme edge cases are ruled out, not as impossible but as fairly unlikely.

[Fig. 5: Replacing our original SME distributions that had peaks with uniform distributions flattens out the distribution of our ratio of rentals to sales, causing the 80th percentile prediction interval to widen. The new range runs from about 1.7 to 18.]

The following graph displays the full range of sales and rental variation that is possible depending on our degrees of belief (as represented by our choice of distribution) about the range of total transactions and total value.

[Fig. 6: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type.]

By focusing on the 80th percentile range of outcomes in the ratio of rentals to sales, we can significantly improve the credible range to estimate the rentals and direct sales from the approximate information we were given.

[Fig. 7: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type, constrained only to those values in the 80th percentile prediction interval.]

Precise? Not within a hair’s breadth, no, but the degree of precision we obtain by employing probabilities (as opposed to relying on just a best guess with no understanding of the implications of the range of the assumptions) into our analysis improves by a factor of 13.1 (assuming maximum uncertainty) to 35.2 (trusting our SME). If our own planning depends on an understanding of this sales ratio, we can exercise more prudence in the effective allocation of the resources required to address it. Now, when our manager asks, “How do you know the actual values aren’t near the edge cases?”, we can respond by saying that we don’t know precisely, but using simple algebra combined with probabilities dictates that the actual values most likely are not.

Labels: , , , , , , ,

The Zen of Decision Making

I copied the following nineteen zen-like koans from the website devoted to the Python programming language (don't leave yet...this isn't really going to be about programming!).
  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren't special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one-- and preferably only one --obvious way to do it.
  • Although that way may not be obvious at first unless you're Dutch.
  • Now is better than never.
  • Although never is often better than *right* now.
  • If the implementation is hard to explain, it's a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea -- let's do more of those!

The koans are supposed to communicate the essence of the guiding principles of programming. Their zen-like fashion is intended to motivate reflection and discussion more so than state explicit rules. In fact, there is a twentieth unstated (Or is it? How's that for zen-like clarity?) principle that you must discover for yourself.

Good aphorisms often find meaning beyond their initial intent. That's the way general, somewhat ambiguous guidance works and why some aphorisms last for so long in common parlance. They're malleable to one's circumstances and provide a kind of structure on which to hinge one's thoughts, concerns, and aspirations (I'm pretty sure horoscopes and Myers Briggs work this way). Some of these aphorisms, maybe all of them, struck me as not only useful as guiding principles for programming but also for decision management in general. Seriously. Go back and consider them again, Grasshopper.

So, let me ask you:
  • In what way is decision management like programming?
  • How would you interpret these principles, if at all, for use in the role of decision making?
  • What do you think is the missing principle?

Labels: ,

Monday, October 20, 2014

Moar Accuracies

You've probably heard the saying, "It's better to be mostly accurate than precisely wrong." But what does that mean exactly? Aren't accuracy and precision basically the same thing?

Accuracy relates to the likelihood that outcomes fall within a prediction band or measurement tolerance. A prediction/measurement that comprehends, say, 90% of actual outcomes is more accurate than a prediction/measurement that comprehends only 30%. For example, let's say you repeatedly estimate the number of marbles in several Mason jars mostly full of marbles. An estimate of "more than 75 marbles and less than 300 marbles" is probably going to be correct more often than "more than 100 marbles but less than 120 marbles." You might say that's cheating. After all, you can always make your ranges wide enough to comprehend any range of possibilities, and that is true. But the goal of accuracy is just to be more frequently right than not (within reasonable ranges), and wider ranges accomplish that goal. As I'll show you in just a bit, accuracy is very powerful by itself.

Precision relates to the width of the prediction/measurement band relative to the mean of the prediction/measurement. A precision band that varies around a mean by +/- 50% is less precise than one that varies by +/- 10%. When people think about a precise prediction/measurement, they usually think about one that is both accurate and precise. A target pattern usually helps make a distinction between the two concepts.
The canonical target pattern explanation of accuracy and precision.

The problem is that people jump past accuracy before that attempt to be precise, thinking that the two are synonymous. Unfortunately, unrecognized biases can make precise predictions extremely inaccurate, hence the proverbial saying. Jumping ahead of the all too important step of calibrating accuracy is where the "precisely wrong" comes in.

Good accuracy trucks many more miles in most cases than precision, especially when high quality, formal data is sparse. This is because the marginal cost of improving accuracy is usually much less than the marginal costs of improved precision, but the payoff for improved accuracy is usually much greater. To understand this point, take a look again at the target diagram above. The Accurate/Not Precise score is higher than the Not Accurate/Precise score. In practice, a lot of effort is required to create a measurement situation that effectively controls for the sources of noise and contingent factors that swamp efforts to be reasonably more precise. Higher precision usually comes at the cost of tighter control, heightened attention on fine detail, or advanced competence. There are some finer nuances even here in the technical usages of the terms, but these descriptions work well enough for now.

Be careful, though - being more accurate is not just a matter of going with your gut instinct and letting that be good enough. Our gut instinct is frequently the source of the biases that make our predictions look as if we were squiffy when we made them. We usually achieve improved accuracy through the deliberative process of accounting for the causes and sources of the variation (or range of outcome) we might observe in the events we're trying to measure or predict. The ability to do this reflects the depth of expert knowledge we possess about the system we're addressing, the degree of nuances we can bring to bear to explain the causes of variation, and a recognition of the sources of bias that may affect our predictions. In fact, achieving good accuracy usually begins by assessing that we may be biased at all (and we usually are) and why.

Once we've achieved reasonable accuracy about some measurement of concern, it might then make sense to improve our precision of the measurement if the payoff is worth the cost of intensified attention and control. In other words, we only need to improve our precision when it really matters.
[Image from FreeDigitalPhotos.net by Salvatore Vuono.]

Labels: , , ,

Monday, September 22, 2014

Are Your Spreadsheets the Problem?

Mr. Patrick Burns at Burns Statistics (no, not that Mr. Burns) provides an excellent overview for the hidden dangers that lurk in your spreadsheets. Guess what. The problems aren't just programming errors and the potential for their harm, but are errors that are inherent to the spreadsheet software itself. That's right. Before your analysts even make an error, the errors are already built in. Do you know what's lurking in your spreadsheets? Well, do you?

Before you answer that question, ask yourself these:
  1. What quality assurance procedures does our organization employ to ensure that our spreadsheets are free of errors of math, units conversion, and logic? 
  2. What effort does our organization undertake to make sure that the decision makers and consumers of the spreadsheet analysis comprehend the assumptions, intermediate logic, and results in our spreadsheets? 
  3. How do we ensure that spreadsheet templates (or repurposed spreadsheets or previously loved spreadsheets) are actually contextually coherent with the problem framing and subsequent decisions that the spreadsheets are intended to support? 
Each question actually addresses an hierarchically more important level of awareness and intention in our organizations. The first question addresses the simple rules of math and if they are satisfied. The second question addresses the level of agreement that the math/logic coordinates in a meaningful way and is capable of supporting valid and reasonable insights, inferences, or accurate predictions about the system or problem it describes and that everyone understands why. The last question, the most important question, IMHO, addresses whether our analyses point in the right direction of inquiry at all.

My suspicion is that errors of the first level run amok much more than people are willing to admit, but their prevalence is relatively easy to estimate given our knowledge about the rates at which programming errors occur, why they occur, and how they propagate geometrically through spreadsheets. Mr. Burns recommends that the programming language R is a better solution than spreadsheets and easier to adopt than might be currently imagined by your analysts. I agree. I happen to like R a lot, but I love Analytica as a modeling environment more. But the solution to our spreadsheet modeling problems isn't going to be completely resolved by our choice of software and programming mastery of it.

My greater suspicion is that errors of the second and third level are rarely addressed and pose the greatest level of risk to our organizations because we let spreadsheets (which are immediately accessible) drive our thinking instead of letting good thinking determine the structure and use of our spreadsheets. To rid ourselves of the addiction to spreadsheets and their inherent risks, we have to do the hard work first by starting with question 3 and then working our way down to 1. Otherwise, we're being careless at worst and precisely wrong at best.

(Originally published at LinkedIn.)

Labels: , , ,

Thursday, July 17, 2014

When A Picture is Worth √1000 Words

This morning @WSJ posted a link to the story about Microsoft’s announcement of its plans to lay off 18,000 employees. This picture (as captured on my iPhone)...

[click image to enlarge]

...accompanied the tweet, which is presumably available through their paywall link.

While I’m really sorry to hear about the Microsoft employees who will be losing their jobs, I am simply outraged at the miscommunication in the pictured graph. (This news appeared to me first on Twitter, and the seemingly typical response on Twitter is hyperbolic outrage.)

Here’s the problem as I see it: the graph communicates one-dimensional information with two-dimensional images. By doing so, it distorts the actual intensity of the information the reporters are supposed to be conveying in an unbiased manner. In fact, it makes the relationships discussed appear much less dramatic than it actually is.

For example, look at Microsoft’s (MSFT) revenue per employee compared to Apple’s (AAPL). WSJ reports MSFT is $786,400/person; APPL, $2,128,400. The former is 37% of the latter. But for some reason, WSJ communicates the intensity with an area, a two-dimensional measure, whereas intensity is one-dimensional. Our eyes are pulled to view the length of the side of the square as a proxy for the measurement being communicated. The sides of the squares are proportionally equal to √(786,400) and √(2,128,400); therefore, the sides of the squares visually communicate the ratio of the productivity of MSFT:AAPL as 61%. In other words, the chart visually overstates the relative productivity of MSFT's employees compared to that of AAPL's by a factor of 1.62.

If the numbers are confusing there, consider this simpler example. The speed of your car as measured by your speedometer is an intensity. It’s one dimensional. It tells you how many miles (or kilometers, if you’re from most anywhere else outside the US) you can cover in one hour if your car maintains a constant speed. Your speedometer aptly uses a needle to point to the current intensity as a single number. It does not use a square area to communicate your speed. If it did, 60 miles per hour would  look 1.41 times faster than 30 miles per hour instead of the actual 2 times faster that it really is. The reason for this is that the the sides of the squares used to display speed would have to be proportional to the square roots of the speed. The square roots of 60 and 30 are 7.75 and 5.48, respectively.

For your own personal edification, I have corrected the WSJ graph here:

[click image to enlarge]

Do you see, now, how much more dramatic the AAPL employees' productivity is over that of MSFT's?

This may not seem like a big deal to you at the moment, but consider how much quantitative information we communicate graphically. The reason is that, as the cliché goes, a picture is figuratively worth a thousand words. I firmly believe graphical displays of information are powerful methods of communication, and a large part of my professional practice revolves around accurately and succinctly communicating complex analysis in a manner that decision makers can easily consume and digest. But I’m also keenly aware of how analyst and reporters often miscommunicate important information via visual displays, either by design, inexperience, or by trying to be too clever. I see these transgressions all the time in the analyses I’m asked to audit.

The way we communicate information is not just a matter of style for business reporters. We often make prodigious decisions based on information. If information is communicated in a way that distorts the underlying relationships involved, we risk making serious misallocations of scarce resources. This affects every aspect of the nature of our wealth - money, time, and quality of life. The way we communicate information bears fiduciary responsibilities.

For discussion sake I ask,

  1. How often have you seen, and maybe even been victimized by, graphical information that miscommunicates important underlying relationships and patterns?
  2. How often have you possibly incorporated ineffective means of graphically communicating important information? (Pie charts, anyone?)

If you want to learn more about the best ways to communicate through the graphical display of quantitative information, I highly recommend these online resources as a starting point:

Labels: , ,

Friday, July 11, 2014

The Value of Knowing What You Do Not Know

Labels: , , ,

Tuesday, February 25, 2014

How Do You Know That? Funny You Should Ask.

During a recent market development planning exercise, my client recognized that his colleagues were making some rather dubious assumptions regarding the customers they were trying to address (i.e., acceptable price, adoption rate, lifecycle, market size, etc.), the costs of development, and costs of support. Although he frequently asked “How do you know that?”, he seemed to face irritation and mild belligerence in reaction from those he asked to justify their assumptions. So, together we devised a simple little routine to force the recognition that assumed facts might be shakier than previously thought.

After bringing the development team members together, we went around the room and asked for a list of statements that each believed to be true that must be true for the program to succeed. We wrote each down as a succinct, declarative statement. Then, after everyone had the opportunity to reflect on the statements, we converted each to a question simply by converting the periods to question marks.

Before Western explorers proved that the Earth is round, ships used to sail right off the assumed edges.

We then asked the team to supply a statement that answered each question in support of the original statement. Once this was completed, we then appended the dreaded question mark to each of these responses. We repeated this process until no declarative answers could be supplied in response to the questions. The cognitive dissonance among the team members became palpable as they all had to start facing the uncomfortable situation that what they once advocated as fact was largely unsupportable. Many open questions remained. More uncertainty reigned than was previously recognized. The remaining open questions then became the basis for uncertainties in our subsequent modeling efforts in which we examined value tradeoffs in decisions as a function of the quality of information we possessed. You probably won’t be surprised to learn that the team faced even more surprises as the implications of their tenuous assumptions came to light.

I am interested to know how frequently you find yourself participating in planning exercises at work in which key decisions are made on the basis of largely unsupported or untested assumptions. My belief is that such events happen much more often than we care to admit.

I would also be interested to know if the previously described routine works with your colleagues to force awareness of just how tenuous many preconceived notions really are. I outline the steps below for clarity.
  1. Write down everything you believe to be true about the issue or subject at hand. 
  2. Each statement should be a single declarative statement. 
  3. Read each out loud, forcing ownership of the statement.
  4. Convert each statement to a question by changing the period to a question mark.
  5. Again, read each out loud as a question, opening the door to the tentative nature of the original statement.
  6. Supply a statement that you believe to be true that answers each question.
  7. Repeat the steps above until you reach a point with each line of statements-questions where you can no longer supply answers.
You might find that using a mind mapping tool such as MindNode or XMind are useful for documenting and displaying the assumptions and branching question/responses. The visual display may serve to help your team see connections among assumptions that were not previously recognized.

Let me know if you try this and how well it works.

Labels: , ,

Wednesday, January 22, 2014

Can Modeling a Business Work?

A friend on LinkedIn asks, “Can modeling a business work?” I respond:

For now, or at least until The Singularity occurs, the development of business ideas and plans is a uniquely human enterprise that springs from a combination of intuition, goals, and ambitions. That should not mean, however, that we cannot effectively supplement our intuition and planning with aids to management and decision making. While I think human intuition is a very powerful feature of our species, I’m also convinced it can be led astray or corrupted by biases very quickly, particularly amid the complexities that arise as plans turn into real life execution. This is not a modern realization. The origin of the principles of inventory management, civil engineering, and accounting date back to the antiquities. Think of the seagoing merchants of the Phoenicians and the public works building Babylonians and Egyptians. In fact, historians now believe that the actual founder of Arthur Andersen LLP was none other than the blind Venetian mathematician and priest, Luca Pacioli (ca. 1494). That's right - that musty odor that emanates from accounting books is due to their being more than 500 years old.

Luca Pacioli doodling circles out of sheer boredom after a day of accounting. I made up the part about his being blind.

Business modeling is a tool similar to accounting in that it aids our thinking in a world whose complexity seems often to exceed the grasp of our comprehension. I look at the value of modeling a business as a means to stress test both the business plan logic and the working assumptions that drive the business plan. In regard to the business plan logic, we're asking if the business has the potential ability to produce the value we think it can; and in regard to the working assumptions, we're testing how sensitively important metrics (i.e., payback time, break-even, required resources, shareholder value) of the business plan respond to conditions in the environment and controllable settings to which our business plan will be subjected.

Obtaining such insights from modeling a business, business leaders can modify business plans by changing policies about pricing, products/services offered, costs targeted for reduction or elimination, and contingency or risk mitigation plans that can be adopted, etc. 

However, I recommend awareness of at least three caveats with regard to business modeling:
  1. Think of such models as "what-ifs" more so than precise forecasts. Use the "what if" mindset to make a business plan more robust against the things outside your direct control versus using it to justify a belief in guaranteed success. The latter is almost a sure fire approach to failure. 
  2. Always compare more than one plan with a model to minimize opportunity costs. Often times, the best business plans derive from hybrids of two models that show how value can be created and retained for at least two different reasons. 
  3. Avoid overly complex models as much as, maybe more so than, overly simplistic models. Building a requisite model from an influence diagram first is usually the best way to achieve this happy medium before writing the first formula in a spreadsheet or simulation tool. Richer, more complex models that correspond to the real world with the highest degree of precision are usually not useful for a number of reasons:
    • they can be costly to build
    • the value frontier of the insights derived decline relative to the cost to achieve them as the degree of complexity increases
    • they are difficult to maintain and refactor for other purposes
    • they are often used to justify delaying commitment to a decision
    • few people will achieve a shared understanding that is useful for collaborating and execution
A requisite model, on the other hand, should deliver clarity and permit making new and interesting testable predictions or reveal insights about, say, uncertainties, that could be made to work in your favor. Admittedly, though, it takes a lot of practice to achieve this third recommendation, but it should be used as a guiding principle.

Labels: , , ,

Sunday, January 12, 2014

Double, double toil and trouble; Fire burn, and caldron bubble

This was a great article in The Wall Street Journal today.

For me, the key take away point can be summed up in this quote from Prof. Goetzmann: "Once people buy in, they start to discount evidence that challenges them..." I relate this not only to investing decisions in the market, but also to making organizational decisions--investments in capital projects, new strategies, the next corporate buzz. We've all seen or been apart of the exuberant irrationality that leads organizations into malinvestments.

Let's consider the complementary action--saying "no." Against the tendency toward the irrational "yes, Yes, YES!", learning to say "no" is a very important skill to master. It's probably one of the hardest skills to master when people request something from us that makes us feel important and liked.

I think, however, we always need to be aware that many of our initial reactions are often driven by biases. Reactively saying "no," once we've learned to say it and it becomes easy to do, can emerge from the same biases that urge us unreservedly to say "yes." Both incur their costs: missed opportunity, waste, and rework.

The skill more important to learn than saying "no" is acquiring the skill to consider disconfirming evidence, especially when that evidence challenges our dearest assumptions about what is going to make us rich. Let's not be so quick to say "yes" or smug when we say "no." Rather, let's learn the practice of asking,
  • "what information might disabuse me of my favorite assumptions?"
  • "what biases are preventing me from seeing clearly?"
Failing to learn these, we all too often find ourselves concocting a witches' brew.

Tuesday, September 10, 2013

It's Your Move: Creating Valuable Decision Options When You Don't Know What to Do

The followings is the first chapter excerpt from my newly published tutorial.

Business opportunities of moderate to even light complexity often expose decision makers to hundreds, if not tens of thousands, of coordinated decision options that should be considered thoughtfully before making resource commitments. That complexity is just overwhelming! Unfortunately, the typical response is either analysis paralysis or "shooting from the hip," both of which expose decision makers to unnecessary loss of value and risk. This tutorial teaches decision makers how to tame option complexity to develop creative, valuable decision strategies that range from "mild to wild" with three simple thinking tools.

Read more here.

Labels: ,

Wednesday, July 24, 2013

RFP Competitive Price Forecasting Engine

Developing a competitive price in response to an RFP is difficult and fraught with uncertainty about competitor pricing decisions. "Priced to Win" approaches often lead to declining margins. Our approach and tool set allow you to develop a most likely price neutral position that helps you focus more attention on providing "intangible" benefits that differentiate your offering in a way that is more valuable to your potential client.

Labels: , ,

Tuesday, July 23, 2013

Business Case Analysis with R

The following is the first chapter excerpt from my newly published book.

Business Case Analysis with R

A Simulation Tutorial to Support Complex Business Decisions

1.2 Why use R for Business Case Analysis?
Even if you are new to R, you most likely have noticed that R is used almost exclusively for statistical analysis, as it's described at The R Project for Statistical Computing. Most people who use R do not frequently employ it for the type of inquiry which business case analysts use spreadsheets to select projects to implement, make capital allocation decisions, or justify strategic pursuits. The statistical analysis from R might inform those decisions, but most business case analysts don't employ R for those types of activities.

Obviously, as the title of this document suggests, I am recommending a different approach from the status quo. I'm not just suggesting that R might be a useful replacement for spreadsheets; rather, I'm suggesting that better alternatives to spreadsheets be found for doing business case analysis. I think R is a great candidate. Before I explain why, let me explain why I don't like spreadsheets.

Think about how a spreadsheet communicates information. It essentially uses three layers of presentation:
  1. Tabulation
  2. Formulation
  3. Logic
When we open a spreadsheet, usually the first thing we see are tables and tables of numbers. The tables may have explanatory column and row headers. The cells may have descriptive comments inserted to provide some deeper explanation. Failure to provide these explanatory clues represents more a failing of the spreadsheet developer's communication abilities than a failing of the spreadsheet environment, but even with the best of explanations, the emergent pattern implied by the values in the cells can be difficult to discern. Fortunately, spreadsheet developers can supply graphs of the results, but even those can be misleading chart junk.

To understand how the numbers arise, we might ask about the formulas. By clicking in a cell we can see the formulas used, but unfortunately the situation here is even worse than the prior level of presentation of tables of featureless numbers. Here, we don't see formulas written in a form that reveals underlying meaning; rather, we see formulas constructed by pointing to other cell locations on the sheet. Spreadsheet formulation is inherently tied to the structural presentation of the spreadsheet. This is like saying the meaning of our lives should be dependent on the placement of furniture in our houses.

While the goal of good analysis should not be more complex models, a deeper inquiry into a subject usually does create a need for some level of complexity that exceeds the simplistic. But as a spreadsheet grows in complexity, it becomes increasingly difficult to extend the size of tables (both by length of indices that structure them and the number of indicies used to configure the dimensionality) as a direct function of its current configuration. Furthermore, if we need to add new tables, choosing where to place them and how to configure them also depends almost entirely on the placement and configuration of previously constructed tables. So, as the complexity of a spreadsheet does increase, it naturally leads to less flexibility in the way the model can be represented. It becomes crystalized by the development of its own real estate.

The cell referencing formulation method also increases the likelihood of error propagation because formulas are generally written in a quasi-fractal manner that requires the formula to be written across every element in at least one index of a table's organizing structure. Usually, the first instance of a required formula is written within one element in the table; then, it is copied to all the appropriate adjacent cells. If the first formula is incorrect, all the copies will be, too. If the formula is sufficiently long and complex, reading it to properly debug it becomes very difficult. Really, the formula doesn't have to be that complicated or the model that complex for this kind of failure to occur, as the recent London Whale VaR model and Reinhart-Rogoff Study On Debt debacles demonstrated.[1]

All of this builds to the most important failure of spreadsheets -- the failure to clearly communicate the underlying meaning and logic of the analytic model. The first layer visually presents the numbers, but the patterns in them are difficult to discern unless good graphical representations are employed. The second layer, which is only visible unless requested, uses an arcane formulation language that seems inherently irrational compared to the goal of good analysis. The final layer--the logic, the meaning, the essence of the model--is left almost entirely to the inference capability of any user, other than the developer, who happens to need to use the model. The most important layer is the most ambiguous, the least obvious. I think the order should be the exact opposite.

When I bring up these complaints, the first response I usually get is: "ROB! Can't we just eat our dinner without you complaining about spreadsheets again?" But when the population of my dinner company tends to look more like fellow analysts, I get, "So what? Spreadsheets are cheap and ubiquitous. Everyone has one, and just about anyone can figure out how to put numbers in them. I can give my analysis to anyone, and anyone can open it up and read it."

Then I'm logically--no, morally--compelled to point out that carbon monoxide is cheap and ubiquitous, that everyone has secrets, that just about everyone knows how to contribute to the sewage system, that just about everyone can read your diary and add something to it. Free, ubiquitous, and easy to use are all great characteristics of some things in their proper context, but they aren't characteristics that are necessarily universally beneficial.

More seriously, though, I know that what most people have in mind with the common response I receive is the low cost of entry to the use of spreadsheets and the relative ease of use for creating reports (which I think spreadsheets are excellent for, by the way). Considering the shortcomings and failure of spreadsheets based on the persistent errors I've seen in client spreadsheets and the humiliating ones I've created, I think the price of cheap is too high. The answer to the first part of their objection--spreadsheets are cheap--is that R is free. Freer, in fact, than spreadsheets. In some sense, it's even easier to use since the formulation layer can be written directly in a simple text file without intermediate development environments. Of course, R is not ubiquitous, but it is freely available on the internet.

Unlike spreadsheets, R is programming language with the built in capacity to operate over arrays as if they were whole objects, a feature that demolishes any justification for cell-referencing syntax of spreadsheets. Consider the following example.

Suppose we want to model a simple parabola over the interval (-10, 10). In R, we might start by defining an index we call x.axis as an integer series.

x.axis <– -10:10

which looks like this,

[1] -10  -9  -8  -7  -6  -5  -4  -3  -2  -1 0  1  2  3  4  5  6  7  8  9  10

when we call x.axis.

To define a simple parabola, we then write a formula that we might define as

parabola <– x.axis^2

which produces, as you might now expect, a series that looks like this:

>[1] 100  81  64  49  36  25  16  9  4  1  0  1  4  9  16  25  36  49  64  81 100.

Producing this result in R required exactly two formulas. A typical spreadsheet that replicates this same example requires manually typing in 21 numbers and then 21 formulas, each pointing to the particular value in the series we represented with x.axis. The spreadsheet version produces 42 opportunities for error. Even if we use a formula to create the spreadsheet analog of the x.axis values, the number of opportunities for failure remains the same.

Extending the range of parabola requires little more than changing the parameters in the x.axis definition. No additional formulas need be written, which is not the case if we needed to extend the same calculation in our spreadsheet. There, more formulas need to be written, and the number of potential opportunities for error continues to increase.

The number of formula errors that are possible in R is directly related to the total number of formula parameters required to correctly write each formula. In a spreadsheet, the number of formula errors is a function of both the number of formula parameters and the number of cell locations needed to represent the full response range of results. Can we make errors in R-based analysis? Of course, but the potential for those errors is exponentially smaller.

As we've already seen, too, R operates according to a linear flow that guides the development of logic. Also, variables can be named in a way that makes sense to the context of the problem[2] so that the program formulation and business logic are more closely merged, reducing the burden of inference about the meaning of formulas for auditors and other users. In Chapter 2, I'll present a style guide that will help you maintain clarity in the definition of variables, function, and files.

However, while R answers the concerns of direct cost and the propagation of formula errors, its procedural language structure presents a higher barrier to improper use because it requires a more rational, structured logic than is required by spreadsheets, requiring a rigor that people usually learn from programming and software design. The best aspect of R is that it communicates the formulation and logic layer of an analysis in a more straightforward manner as the procedural instructions for performing calculations. It preserves the flow of thought that is necessary to move from starting assumptions to conclusions. The numerical layer is presented only when requested, but logic and formulation are more visibly available. As we move forward through this tutorial, I'll explain more how these features present themselves for effective business case analysis.

1.3 What You Will Learn
This document is a tutorial for learning how to use the statistical programming language R to develop a business case simulation and analysis. I assume you possess at least the skill level of a novice R user.

The tutorial will consider the case in which a chemical manufacturing company considers constructing a new chemical reactor and production facility to bring a new compound to market. There are several uncertainties and risks involved, including the possibility that a competitor brings a similar product online. The company must determine the value of making the decision to move forward and where they might prioritize their attention to make a more informed and robust decision.

The purpose of the book is not to teach you R in a broad manner. There are plenty of resources that do that well now. Rather, it will attempt to show you how to

  • Set up a business case abstraction for clear communication of the analysis
  • Model the inherent uncertainties and resultant risks in the problem with Monte Carlo simulation
  • Communicate the results graphically
  • Draw appropriate insights from the results
So, while you will not necessarily become a power user of R, you will gain some insights into how to use this powerful language to escape the foolish consistency of spreadsheet dependency. There is a better way.

1.4 What You Will Need
To follow this tutorial, you will need to download and install the latest version of R for your particular OS. R can be obtained here. Since I wrote this tutorial with the near beginner in mind, you will only need the base install of R and no additional packages.

1: You will find other examples of spreadsheet errors at Raymond Panko's website. Panko researches the cause and prevalence of spreadsheet errors.

2: Spreadsheets allow the use of named references, but the naming convention can become unwieldy if sections in an array need different names.

Read more here: Or, if you prefer Amazon or Scribd.

Labels: , , , , ,

Wednesday, March 06, 2013

Never Tell Me The Odds?

In a previous post, I discussed the meaning of expected value (EV) and how it's useful for comparing the values of choices we could make when the outcomes we face with each choice vary across a range of probabilities. The discussion closed by comparing the choice to play two different games, each with different payoffs and likelihoods. Game 1 returns an EV of $5, even though it could never actually produce that outcome; and Game 2 returns an EV of $4, also being incapable of producing that outcome.

But let's say that you hate it when C-3PO tells you the odds, so you commit to Game 2 because you like the upside potential of $15, and you think the potential loss of $5 is tolerable. After all, Han Solo always beat the odds, right? Well, before you so commit, let me encourage you to look into my crystal ball to show you what the future holds…not just in one future, but many.

C-3PO: “Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1.” Han Solo: “Never tell me the odds.”
Figure 1: Han Solo shakes his finger.
C-3PO: “Sir, the possibility of successfully navigating an asteroid field is approximately 3,720 to 1.”
Han Solo: “Never tell me the odds.”
I set up an Analytica model with the following characteristics. A sequence index (Play Sequence) steps from 1 to 1,000. Over this index I will toss two "coins," one with Probability of Win of 50% (Game 1), and the other (Game 2) with Probability of Win of 45% according to the way I set up the game in my last post. Each Game tosses the coin independently across the Play Sequence, recording the outcome on each step. A Game Reward ($10 for Game 1; $15 for Game 2) is allocated to a win, and a Game Penalty ($0 for Game 1; -$5 for Game 2) is allocated to a loss. Then, I cumulate the net Game Earnings across the Play Sequence to show on what value the cumulative earnings might converge for many repeated games choices. Not only do I play the games over 1000 sequence steps, I also play the sequences across 1000 universes of parallel iterations. From this point forward, I will refer to an "iteration" as the pattern that occurs in one of these parallel universes across the 1000 games it plays in sequence.
Figure 1: Analytica Influence Diagram. After each game step, wins are assigned a reward; losses, a penalty.
Figure 2: Analytica Influence Diagram. After each game step, wins are assigned a reward; losses, a penalty. Net returns are accrued.
What do we observe? In one iteration, we see you start off accruing net earnings in excess of the higher valued Game 1 across the first 400 or so games; however, your luck turns. You end up regretting not taking Game 1.
Figure 3: A streak of early wins can deceive your long term anticipations.
Figure 3: A streak of early wins can deceive your long term anticipations.
In another iteration, you marginally regret not taking Game 1. You can easily imagine that the outcome could be slightly reversed from this, finding yourself happy that Game 2 just beat out Game 1. Would that outcome prove anything about your skill as a gambler?
Figure 4: You might come close to beating the odds. In fact, it's conceivable that you can.
Figure 4: You might come close to beating the odds. It appears conceivable that you can. Maybe your luck will turn in the next game you play.
But of course, your luck might turn out much worse. In another iteration, you really regret your bravado.
Figure 5: Yikes!
Figure 5: Yikes!
"Of course," you say, "because the probabilities tell me that I might expect some unfortunate outcomes as well as some beneficial ones. But what overlap might there be over all the iterations at play? Is there some universe in which the odds ever really are in my favor?"

Here's what we see. For Game 1, the accrued earnings range from ~$4,500 to ~$5,500 by the 1000th step.
Figure 6: the accrued earnings range from ~$4,500 to ~$5,500 by the 1000th step for Game 1.
Figure 6: the accrued earnings range from ~$4,500 to ~$5,500 by the 1000th step for Game 1.
For Game 2, the accrued earnings range from ~$3,000 to ~$5,100. Clearly some overlap potential exists out there.
Figure 7: the accrued earnings range from ~$3,000 to ~$5,100 for Game 2.
Figure 7: the accrued earnings range from ~$3,000 to ~$5,100 for Game 2.
In fact, in the early stages of the game sequences, the potential for overlap appears to be significant, and there seems to be a set of futures where the overlap persists. You might just make that annoying protocol droid wish he had silenced his electronic voice emulator.

But take a second look. That second-from-the-top band for Game 2 converges on the second-from-the-bottom band in Game 1. These are the upper and lower 5th percentile bands of outcome, respectively.
Figure 8: The final distributions of the two games shows that there is some universe in which your luck holds up.
Figure 8: The final distributions of the two games shows that there are some universes in which your luck holds up...a very small amount of up.
When we count how likely it is that Game 2 ends in winning conditions at various intervals points along the way, the perceived benefit in the higher potential reward of Game 2 decays rapidly. Before you even start, the chance that you could be in a better position by step 1000 for taking Game 2 is around half a percent.
Figure 9: The long term perspective of maintaining a winning position in decays rapidly for Game 2.
Figure 9: The long term perspective of maintaining a winning position in decays rapidly for Game 2.
In the model, note that the average earnings by step 1,000 for Game 1 is $5,000 (i.e., 1000*EV1) and that the earnings for Game 2 is $4,000 (i.e., 1000*EV2). It's as if the imputed EV of each game inexorably accumulated over time…a very long time and over many universes.

So it is in the fantasy of Hollywood that the mere mention of long odds ensures the protagonist's success. Unfortunately, life doesn't always conform to that fantasy. Over a long time and many repeated occasions to play risky games, especially those that afford little opportunity to adjust our position or mitigate our exposure, EV tells us that our potential for regret increases for having chosen the lesser valued of the two games. Depending on the relative size of the EVs between the two choices, that potential for regret can occur rapidly as the inherent outcome signal implied by the EV begins to overwhelm the potential short-term lucky outcomes in the random noise of the game.

So how can you know when you will be lucky? You can't. The odds based on short-term observations of good luck will not long be in your favor. Your fate will likely regress to the mean.

(This post was also simultaneously published at the Lumina Blog.)

Labels: , , ,

Wednesday, February 27, 2013

Will You Be Mine?

Business is not war. In fact, I'm getting tired of the trope that relies on this analogy. I think it's destructive and counterproductive - just like war.

I can understand the attractiveness of the metaphor as business sometimes looks like its getting all Lord of the Flies. Companies come and go, apparently succumbing to the forces of competition. Profits are made and lost. People's jobs, like lives on a battlefield, are on the line. Kill or be killed. It all frequently feels like a zero sum game.

And war usually is a zero sum game. One side wins and the other side loses. Well, I'm not even sure that is entirely accurate. War involves losses on both sides, the value of which may actually exceed the estimated value of going to war. In the cases of so called Pyrrhic Victories, the victor simply cannot afford to keep engaging in excursions of conquest. Martial conquest does not guarantee profit.

But as similar as business can seem to war, business is not quite the same. Sure, competition is ever present. Contracts are violated. Deals fail to close or are lost to another offeror. There's espionage and subterfuge. But whereas war usually involves two fronts—red versus blue, Joe versus Charlie, Allied versus Axis—business involves at least three fronts: you, the competition, and your customers.

In war, the primary focus rests on the competition, and the goal is to eliminate them, either by all out destruction or by dousing their will to contend. In war, one side usually surrenders to the other. But this is really not the case in business. The primary goal is not to eliminate the competition (it may be counterproductive to do so), but to win the attention of your customers. The real goal of business is to make a transaction in which at least two sides mutually benefit more than if no transaction occurred. The competition is present, and possibly corrosive, but it's not the primary concern.

The competition itself may evolve and satisfy needs and preferences that your own offerings don't satisfy. And so, in this way, business works out to be something more like a complex ecosystem of niche partitioned agents who are seeking to sustain their ability to generate ongoing profitable transactions. And yet it's not so much White Fang as much as it is…well…Pride and Prejudice. (Do I lose my man card for saying that?)

That's right. I think the best metaphor for business is romance in which we as suitors vie for the attention of our beloved—the customer. Again, the competition is there, but it's not our primary concern. We have to learn to deal with it and respond to it; however, if our attention on the competition dominates our activities versus our attention on our customers, current and potential, we may wind up winning a fight but losing our reason for existence, like two boys fighting it out in the school yard over a girl who walks away in disgust.

There is so much more to long term success than revenue generation. But success doesn't happen because we have no competition. Success happens because we provide something that satisfies a need better than the alternatives, even if one of those alternatives is nothing more than what our customers are already doing. What those real needs and preferences are is often hard to identify, but we don't find those out by going to war. We discover them the way lovers learn to fulfill each other's needs. While war is dehumanizing and often provides the psychological barriers that permit actions against others we would normally never consider, romance is about fulfillment.

Let me close with this quote from Marc Hedlund's Blog, in which Marc discloses his thoughts on closing his company, Wasabi:
You can't blame your competitors or your board or the lack of or excess of investment. Focus on what really matters: making users happy with your product as quickly as you can, and helping them as much as you can after that. If you do those better than anyone else out there you'll win.

Labels: , , ,

Wednesday, February 20, 2013

Tighten up the Ship or build an Airplane? - How to decide?

Labels: , , , , ,

Monday, February 18, 2013

Fooling Ourselves

For some peculiar reason, this NPR article brought back memories of when I was a math and physics teacher.  One of the several perennial questions my students used to ask me was, "When will we ever use this, Mr. B.?" Of course, there was always, "Will this be on the test?"

One of my standard responses to the first question (the second usually received a scowl) was that mathematics (insofar as it is actually useful) provides a great tool for determining if you're being cheated. Learning to use it effectively increases our ability to avoid becoming someone else's stooge.

But the NPR article serves to remind us that often the greatest threat to being cheated comes from within. We all need to learn the "algebra" that helps us overcome our own internal scam.

Labels: ,

Tuesday, February 12, 2013

Incite!Sales: Sales Portfolio and Forecasting System

Sales forecasts are notoriously biased, which leads to misallocation of resources and financial surprises. Our sales portfolio & forecasting system removes bias from forecasts to give you a more accurate view of your sales reality so that you can make more informed decisions about opportunities to pursue. http://incitesales.incitedecisiontech.com

Labels: , , ,

Thursday, February 07, 2013

A Brief Explanation of Expected Value

When helping people analyze the risks they face in complex decisions, I frequently receive requests for an explanation of expected value, as expected value is a measure commonly used to compare the value of alternate risky options. I’ve found that by now most people understand the concept of net present value (NPV) rather well, but they still struggle with the concept of expected value (EV)*. Interestingly enough, and fortunately so, the two concepts share some relationship to each other that makes an explanation a little simpler.

NPV is the means by which we consistently compare cash flows shaped differently in time, assuming that money has a greater meaning to us when we get it or spend it sooner rather than later. For example, NPV would help us understand the relative value of a net cash stream that experienced a small draw down in early periods but paid it back in five years versus a net cash stream that makes a larger draw down in early periods but pays it back in three years.

EV is similar. By it we consistently compare future outcome values that face different probabilities of occurring.

When we do NPV calculations, we don’t anticipate that the final value in our bank account necessarily will equal the NPV calculated. The calculation simply provides a way to make a rational comparison among alternate time-distributed cash streams.

Likewise, when we do EV calculations, we don’t anticipate that the realized value necessarily will equal the EV. In fact, in some cases it would be impossible for that outcome to be the case. EV just simply provides a way to make a rational comparison among alternate probability-distributed outcomes.

Here’s a simple example. Suppose I offer you two gambles to play in order to win some money. (Not really, of course, because the State of Georgia reserves the right to engage in games of chance but prohibits me from doing so.)

In the first game, there are even odds (probability=50%) that you will win either $10 on the outcome of a head or $0 on a tail.

In the second game, which is a little more complicated, I use a biased coin for which the odds are slightly less than even, say, 9:11 (probability=45%), of your winning. If you win, you gain $15; lose, you pay me $5. Which is the better game to play? Believe it or not, the answer depends on how you frame the problem, most notably from your perspective of risk tolerance and how many games you get to play. If you can’t afford to pay $5 if you lose the second game on the first toss, you’re better off to go with the first game because you will lose nothing at least and gain $10 at best. However, if you can afford the possible loss of $5 and you can play the game repeatedly over numerous times, expected value tells us how to compare the two options.

We calculate EV in the following way: EV = prob(H)*(V|H) + prob(T)*(V|T).

For the first game, EV1 = 0.5*($10) + 0.5*(0) = $5.

For the second game, EV2 = 0.45*($15) – 0.55*($5) = $4.

So, since you prefer $5 over $4 (you do, don’t you?), you should play the first game, even though the potential maximum award is alluringly $5 more in game two than one.

But here's the point about the outcomes. At no time in the course of playing either game will you have $5 or $4 in your pocket. Those numbers are simply theoretical values that we use to make a probability-adjusted consistent comparison between two risky options.

In a follow up post, I will describe what your potential winnings could look like if you choose to play either game over many iterations across many parallel universes.

*To be honest, I think part of the persistent problem in understanding is contributed by the term "expected" itself. Colloquially, when people use and hear this term, they think "anticipated." In discussions about risk and uncertainty, the technical meaning really refers to a probability weighted average or mean value. Unfortunately, I don't expect that you should wait for us technical types to accommodate common usage. [back]

Labels: , , , ,