Thursday, November 08, 2018

R Packages: leonRdo & inteRest

I have recently developed two packages that can accompany the modeling simulation platform I described in my book Business Case Analysis with R: Simulation Tutorials to Support Complex Business Decisions (available at Springer-Nature/Apress and Amazon).

These packages are:

  • leonRdo 0.1.4: provides median Latin hypercube sampling
  • inteRest 1.0: provides basic finance functions
Go here to see how you can install these packages and see a few highlights from their contents.

Friday, August 03, 2018

Bayesian Reasoning: Discrete Inference with Sequential Data

Or, One Way I Learned to Quit Believing My Prejudices

In my last article on this topic, I showed that considering background information can play a significant role in helping us make less biased judgments. What I hope to show now is that while we learn by updating the information we have through experience, limited experiences can often lead to prejudices about the way we interpret the world; but even broad and deep experience should rarely lead us to certain conclusions.

To get started, imagine playing a game in which someone asks you to infer the number of sides of a die based on the face numbers that show up in repeated throws of the die. The only information you are given beforehand is that the actual die will be selected from a set of seven die having these number of faces: (4, 6, 8, 10, 12, 15, 18). Assuming you can trust the person who reports the outcome on each throw, after how many rolls of the die will you be willing to specify which die was chosen?

Let's use the R programming language to help us think through the problem. Start by specifying the set of the die possibilities such that each number represents the number of sides of a given die. (You might also want to refer to my previous article on Bayesian analysis to familiarize yourself with some of the terminology that follows.)

To read the entire discussion go here.

Friday, March 09, 2018

Book Release: Business Case Analysis with R

I am happy to announce that "Business Case Analysis with R" has been republished through Springer-Nature/Apress. The title is available at both Springer-Nature/Apress and Amazon.

"This tutorial teaches you how to use the statistical programming language R to develop a business case simulation and analysis. It presents a methodology for conducting business case analysis that minimizes decision delay by focusing stakeholders on what matters most and suggests pathways for minimizing the risk in strategic and capital allocation decisions. Business case analysis, often conducted in spreadsheets, exposes decision makers to additional risks that arise just from the use of the spreadsheet environment."



Contact us if you would like to receive a copy for journalistic or academic review or purchase books in bulk for your organization.

So far, the reviews on Amazon have been great!

★★★★★ This book is a great resource for anyone looking to learn more about running simulations
By Matthew C Marzillo on March 30, 2018
I came across this book while I was looking to find a practical resource for applying simulation methods in business settings. While there are many resources on simulation models for academic and research applications there aren't many that address simulations from a business stand point. This book is a great resource for anyone looking to learn more about running simulations and getting some real world experience by test driving Robert's R code. A technical book that is easy read...for the price, it's really hard to pass up!

★★★★★ This book has made me a better analyst
By Buffalo Gal on March 27, 2018
Last week I bought this because I have a project merging 85 spreadsheets with R. I am still on chapter 2 but I LOVE THIS BOOK. Let me tell you why.
A. Intuitive Organization. The book begins with a discussion of what motivated the author - to facilitate more accurate, clear and practical analysis by using R instead of complex spreadsheet designs. It discusses guidelines like file architecture and R syntax. It lays out a progressive approach to the Business Case analysis, starting with the basics.
B. Incredible Content. The elegant code is written in base R so it avoids the drama that can come from snazzy packages. It does require some confidence with R. At the same time it inspires me to stretch my skills and try more sophisticated techniques like Monte Carlo and stochastic simulations
C. Solid delivery. It is easy-to-read even though it is chock full of technical details. It does not have fancy color pictures but it does have simple graphs and visuals that are helpful and easy to understand.

I can't wait to read the rest of this book. It is filled with treasures that will make me more productive, thorough and effective.


★★★★★ Works for any sector and organization
By Russell J Moore on March 25, 2018
I run a niche consulting business focused on education reform. Public education has a long tradition of poor decision-making driven by powers-that-be protectors of the status quo. I am using R - and specifically the tutorials in this book - to identify existing and new measures to include in goals and strategic plans that will actually “move the needle” in public and private K-12 and higher education. I have also shown my copy to friends who do similar critical decision-making in large, private healthcare organizations and have used R before. Just a short skim through this book got them excited about “going deeper” and rejuvenating their analyses and processes. I predict this useful “how to” will take many industries by storm.

★★★★★ R finally enters in the Strategic Planning field.
By Carlos Ortega Fernandez on March 23, 2018
I knew this book through the same author in LinkedIn and I could not withstand to buy it immediately, it deserved it. The subject is not easy, Strategic Planning combines financial concepts but more and more if you want to explore alternative scenarios is when you will require mathematics and probability. This is what this good book is about.

The novel approach it offers is that is written in a very easy to follow R programming language. Perhaps it is the only book about this subject and entirely written in R.

Hopefully there will be new extensions of the book that take advantage of the extensive R's optimization libraries as well as its graphical capabilities.


★★★★★ A very accessible introduction to modelling business scenarios.
By Bill Neaves on March 22, 2018 (on the Canadian Amazon site)
A great overview on business analysis and modelling as a discipline. It is a good addition to my my library on using R as an alternative to spreadsheets. Well done.

★★★★★ A great start on modeling complex business systems
By JAD_ClimBiz on March 20, 2018
I had the good fortune to find Business Case Analysis with R, by Robert D. Brown III, when I was looking for examples of business simulation software. It turned out to be just what I need to get started.

The influence diagrams are especially useful in showing how many factors interact to shape the evolution of a complex business system, especially with and without the many possibilities for uncertainty that must be treated probabilistically.

With the guidance provided by the book, I was able to develop a very useful model of climate change impacts on an electric utility, including probabilistic demand and production of hydro and solar power. The numerical estimates were combined with judgments related to subjective criteria including profitability, reliability, and responsibility using the analytical hierarchy process to suggest an optimum generation asset configuration for the 21st century.


★★★★★ Build better models
By Salil A. Athalye on March 17, 2018
I am one of the fortunate people who made a connection with Robert Brown by purchasing his LeanPub publication entitled Business Case Analysis with R – A Simulation Tutorial to Support Complex Business Decisions. The book comes in at under 100 pages, and the price is less than a week’s worth of Espresso shots, but the value is incalculable.

The general received wisdom for most laypeople in this field is: 1) pick a tool 2) develop a spreadsheet 3) pick one or more distributions based on the similarity to the shape of the distribution and your data and then 4) go wild. Robert, on the other hand, takes great pains to present an effective thought process and workflow and gently guides the reader to help them implement a working example model. At the same time he is imparting wisdom from deep expertise in this field and uncovering some of the theoretical underpinnings that informs model builders in the art and practice without drawing out the heavy duty statistics and mathematics. There are some hidden gems in the R code and some more in the margins. There is an underlying sense of humor and passion for sharing this knowledge evident in the writing.

I am familiar with R but still learned many new tips and tricks. The use of functional programming constructs such as sapply took a while to get used to and in many cases I chose to use loops to help myself while I learned the material. Coming from a 2D spreadsheet world you have to be able to think in terms of data structures and data flows and transformations. Kind of like relearning linear algebra. My tip is to use str() with some of his data structures so you can understand how the indices are traversing through the data structures and how the code is transforming the structures. I must say it does make you appreciate what Excel is doing underneath the hood!

Using this book, I was able to design and implement a full business case simulation for our organization that incorporates uncertainty and risk. This helps us move from single point estimates to ranges and embody uncertainty and risk. And in short order I made it my own by incorporating reproducible research elements using knitr and I have plans to implement a front-end using Shiny. You can spend hundreds of dollars buying college textbooks on this subject matter but many of these books don’t help you actually start implementing a system and using it. That’s why this book is a hidden gem.

And so, why do I feel fortunate? Well, in asking Robert a few questions related to the material I received not only the answers, but also encouragement, perspective and expertise. The combination of all this goodwill flowing back feels like mentoring and I’m very grateful for Robert’s time.

So thank you Robert, for sharing your expertise and wisdom in this book. I highly recommend it to anyone who is not a full time Decision Professional and yet needs to understand the underpinnings of the field and who is ready to move away from Excel spreadsheet hell and leverage the power and flexibility of R.

I look forward to buying new publications from Robert Brown and highly encourage you to buy this book.


★★★★★ Concise tutorials on decision analysis using R language
By AndrewG on March 17, 2018
This book is a re-edited collection of four books originally self-published book on leanpub.com. The book contains four main sections, 1) Business Case Analysis with R, using R programming language to simulate four complex business decisions 2) It's Your Move, about tackling valuable strategic decisions 3) Subject Matter Expert Elicitation Guide, to help assess uncertainties when little data is available, and 4) Information Expresso, about using the value of information (VOI) to make clear decisions efficiently. There also five appendicies: A) Deterministic Model, B) Risk Model, C) Simulation and Finance Functions, D) Decision Hierarchy and Strategy Tables, E) VOI Code Samples. These are a collection of R code and Excel templates explained in the body of the text.

In 2017, I had purchased the self-published ebook mainly looking for R code for concrete examples of Decision Analysis in business. I had purchased previous similarly titled books for Excel, but found them lacking in depth. The author takes an imaginary business, describes the inter-related concerns of revenue and costs to calculate a range of net present values. Then the authors adds additional assumptions and code to model risks, showing the effect on the previous model. It is instructive to see the effect of assumptions on changes on a case by case basis. What the mind has a hard time visualizing, the graphic outputs clearly point out. Of course, the R code can be modified to model other businesses.

I anxiously awaited the release of this book and pdf edition, I am quite pleased with the final product. I found the author's examples and explanations appropriate and clear. The R code, available on the books website, ran in RStudio without difficulty, producing outputs and plots exactly as published.

If your looking for a single source book for concepts and R code on applying Decision Analysis on more complicated, interconnected decisions, I highly recommend this book. The content the first section alone, justifies the price of the entire book. The R code is commented well enough that a programmer could easily translate the algorithms to other programming languages.

Friday, August 04, 2017

Bayesian Reasoning: Gender Inference from a Specimen Measurement

Imagine that we have a population of something composed of two subset populations that, while distinct from each other, share a common characteristic that can be measured along some kind of scale. Furthermore, let’s assume that each subset population expresses this characteristic with a frequency distribution unique to each. In other words, along the scale of measurement for the characteristic, each subset displays varying levels of the characteristic among its members. Now, we choose a specimen from the larger population in an unbiased manner and measure this characteristic for this specific individual. Are we justified in inferring the subset membership of the specimen based on this measurement alone? Baye’s rule (or theorem), something you may have heard about in this age of exploding data analytics, tells us that we can be so justified as long as we assign a probability (or degree of belief) to our inference. The following discussion provides an interesting way of understanding the process to do this. More importantly, I present how Baye’s theorem helps us overcome a common thinking failure associated with making inferences from an incomplete treatment of all the information we should use. I’ll use a bit of a fanciful example to convey this understanding along with showing the associated calculations in the R programming language.

Suppose we are aliens from another planet conducting scientific research on this strange group of bipedal organisms called humans...



To read the entire discussion go here.

Saturday, July 22, 2017

VoyageATL is an online magazine that highlights local small businesses and entrepreneurs and promotes local events. A few days ago they published a small piece on my company, Incite.

Read more VoyageATL - Incite! Decision Technologies.

Book trailers

I made the following "movie" trailers for two of my tutorials to play with the idea of making a teaser that didn't attempt to explain anything, but mostly just to have some fun.

It’s Your Move: Creating Valuable Decision Options When You Don’t Know What to Do


Information Espresso: Using Value of Information for Making Clear Decisions (with support from the R programming language)

Monday, November 14, 2016

Seeking Beta Testers for a Web-based Sales Opportunity Portfolio Analysis Tool



Incite! Decision Technologies has recently developed a simple yet sophisticated web-based sales opportunity portfolio analysis tool that is ready for beta testing. Now we're seeking parties that would be interested in participating at no cost and no obligation.

Specifically, we are looking for progressive sales managers in firms whose sales team pursues high value, low-frequency sales. Examples of target firms might be...
  • Engineering, architecture & construction firms
  • Professional service firms
  • Capital equipment manufacturers
  • Start-ups
The purpose of the tool is to provide
  • Improved accuracy of revenue realization and timing forecasts;
  • Guidance on how to allocate resources to maximize the likelihood of deal closure;
  • Guidance on opportunity selection and prioritization.
Ultimately, you will be able to determine if the sales opportunities you are pursuing are worth the time, effort, and resources.

If you are interested in learning more or know someone who might be, please, contact me via LinkedIn message or send me an email from our web form.

Wednesday, November 09, 2016

The Power of Negative Thinking: “How do we know this opportunity is worth the time and effort?”

The sales process is an inherently risky business. It’s difficult to know if and when a deal will close, what clients really want regardless of what they have stated (i.e., the client may have failed to frame their own needs properly), and what competitors offer in price and quality of deliverables.

Compounding the external uncertainty, we often get in our own way by importing certain kinds of biases into our assessment of the value of the sales opportunities at hand. These biases can include…
  • Unwarranted optimism or wishful thinking – personal enthusiasm or a natural disposition to believe that desired outcomes will most likely occur; or, inflating initial estimates of desired outcomes to appear more effective than is warranted;
  • Sand-bagging – under reporting potential outcomes to appear heroic when better than anticipated outcomes materialize;
  • False precision – reporting anticipated outcomes with an unjustified level of certainty, usually as a single-point estimate rather than a range;
  • Availability – recalling values that are memorable, easily accessible, recent, or extreme;
  • Anchoring – using the first “best guess” as a starting point for subsequent estimating;
  • Expert over-confidence – failure of creativity or hubris (e.g., “I know this information and can’t be wrong because I’m the expert.”);
  • Incentives – the SME experiences some benefit or cost in relationship to the outcome of the term being measured, adjusting his estimate in the direction of the preferred outcome;
  • Entitlement – the SME provides an estimate that reinforces his sense of personal value.
Without bias-free assessments in our decisions to actively pursue sales opportunities, it's nearly impossible to know how to allocate sales and support resources effectively to maximize the likelihood of capturing sales in a profitable and efficient manner. In short, when given the opportunity to pursue multiple opportunities with limited resources, it’s often difficult to know if any given opportunity is worth the time.

As odd as it may sound in a culture that seems to demand almost endless optimism, the Power of Negative Thinking actually helps us to overcome our biases as well as inform us how to obtain better information about the external uncertainties we face. By “negative thinking” we do not mean cynicism or toxic nay-saying. Rather, we refer to a process that asks us to consider critically the opposite of what we too easily assume (or wish) to be true. While Negative Thinking could lead us to consider the effects of unfortunate outcomes or conditions (the opposite of desired outcomes) on sales opportunities...

The best laid schemes o’ Mice an’ Salesmen, Gang aft agley


...it could also lead us to consider the possibility of desirable outcomes or conditions (the opposite of the unfortunate) for situations that we often easily dismiss.

No, no, boy, that's no way to make a plane. That'll, I say, that'll never...fly!

But the Power of Negative Thinking goes beyond our merely considering what can happen. We must also consider the “why” and “to what degree” those things could happen. We can account for the “what,” “why,” and “to what degree” in a process called probabilistic reasoning. But that's the second step. The Power of Negative Thinking begins with accurately framing an opportunity, which requires that a sales team answer the following questions:
  • What is the real opportunity? 
  • What are our goals and objectives?
  • What are the client's goals and objectives?
  • What are the decision boundaries and open decisions?
  • What are the sources of uncertainty? 
Answering these questions helps the team know that it has the right reasons in mind to pursue an opportunity and what constraints in their current level of knowledge limit their ability to make unambiguous decisions about what opportunities to pursue and how to go about pursuing them.

Probabilistic reasoning helps a sales team then answer these questions:
  • What is the likely range of outcomes for the uncertainties? 
  • What are the effects of uncertainties on sales goals, revenues, and profit? 
  • How much risk do we face with each opportunity; i.e., how much could we lose by pursuing one opportunity over another?
  • What insights can we create for contingency plans or options?
  • How do we prioritize our set of current opportunities?

The effect of taking these two steps in a structured way reveals the Power of Negative Thinking so that the sales team can recognize when an opportunity is worth pursuing…or not. Ultimately, not only does the Power of Negative Thinking give the sales team a more accurate assessment of the current state and possibilities they face, they can also develop more effective contingency plans to increase the likelihood of achieving results their organization—and their clients—desire.

A Decision Analyst's View of Electoral Surprise

I turned off the television last night at 8 PM. Since I had an analytics problem to work on, I didn't want my attention divided, and I knew that clinging to electoral results was more neurotic than helpful. My attention at the moment was not going to change the results. So, I rolled up my sleeves and got to work.

At 12:30 AM, I turned my television back on...


As I watched the polling results roll in and followed the reactions of establishment pundits and the broader hoi polloi (from both sides) in social media, all I could think was, "What is going on here?" Over and over. I mean, Nate Silver was still giving better than 2:1 odds of a Clinton victory just before I turned off the TV. Could the situation really have been that different than assessed? Could things really have changed that quickly? At 4 AM, I finally captured some thoughts that I think should serve as object lessons for all of us, and not just in politics, but in business, too.
  1. Never, ever believe your own spin. Humans love narratives that give them comfort. Unfortunately, almost all narratives are constructed from selected evidence that fits a preferred narrative.
  2. Always question where your biases are coming from. You are biased. Until you recognize it, you will frequently be rudely embarrassed. 
  3. There is no meaningful position in certainty. All beliefs about future events should be treated with degrees of belief. 
  4. Even events that happened in the past are open to interpretation. The real issue about the facts of events is not so much whether events have occurred in the past or whether they will occur in the future. The real issue is our epistemic distance from the events. We generally don't know as much as we think we do.
  5. We condition our beliefs on the evidence at hand. Thinking that a Clinton victory was highly probable was not a bad position to take. It made sense given much of the evidence. BUT, Prob(Clinton win) > 50% does mean Prob(Clinton win) = 100%! (I'm actually getting tired of explaining this. I'm getting tired of seeing people make this mistake and the effects it has in real life on real people. Probabilities are degrees of belief, not statements of fact.) Always, always, always consider the disconfirming evidence. 
  6. Trump never showed an insignificant chance of winning. His victory was always probable. What I see and hear coming from those expressing shocked disappointment about the Clinton loss is that they didn't really explore and consider the edge cases that would lead to a Trump victory. Explore the edge cases. Explore aggressively. Keep exploring. 
  7. Informed accuracy trumps false precision (pun intended). Don't be embarrassed to draw your prediction intervals wide. It's more honest, more informative, and will allow you to do a better job preparing contingency plans. When #6 is performed honestly and aggressively, it should lead you to make your prediction intervals even wider. It's better to be humble and recognize how little you know versus being sure and then being rudely surprised.
  8. The evolving probability of win curves for this election resemble the curves associated with predicting that a given hypothesis among several is true when there are unaccounted for characteristics at play. Suddenly, a seemingly most likely explanation crashes to be replaced by a previously less likely hypothesis as the unrecognized characteristic manifests itself. This is a long way to say people get caught up in false dichotomies (or n-chotomies) for the possible explanations for what really is the case. It is almost always the case that more explanations are available than the limited set we originally conceived.
  9. If something really weird happens and somehow the posted results at 4 AM reverse by the time I wake up, all of the above still applies, maybe more so.

Although Nate Silver was leaning in the wrong direction for predicting the outcome, his odds were actually more realistic and informed than many other pollsters who were giving 19:1 odds or better for a Clinton win.

Tuesday, September 27, 2016

New Book: Business Intelligence with R by Dwight Berry

If you are new to data science and learning the R language, let me recommend this new gem of a book, Business Intelligence with R, by my friendr (the term I just coined to describe R users who help each other), Dwight Barry: https://leanpub.com/businessintelligencewithr

Business Intelligence with R serves as a great cookbook that can save you hours of frustration learning how to get the basics going. Even if you're an old pro, the book serves as a handy desk reference.

Also, please consider the personal note that Dwight sent to all of his beta readers:
Perhaps most importantly, I've also decided to give all proceeds to the Agape Girls Junior Guild, which is a group of middle-school girls who do fundraising for mitochondrial disorder research at Seattle Children's Research Institute and Seattle Children's Hospital. While the minimum price for this book will always be free, if you're the type who likes to "buy the author a coffee," know that your donation is supporting a better cause than my already out-of-control coffee habit. :-)
Business Intelligence with R serves a greater cause.

Wednesday, February 10, 2016

Becoming a Business Analytics Jedi: An application of values-framed decision making

I will be speaking at the Georgia Tech Scheller College of Business on February 18, 2016 on the following topic:
In the current rush to adopt data-driven analytics, discussions about algorithms, programming tools, and big data tend to dominate the practice of business analytics. But we are defined by our choices, our values, and preferences. Data and business analytics that do not start with this recognition actually fail to support the human-centered reason for decision making. This is the way of the Sith. A Jedi, however, knows that framing business analytics in terms of the values and preferences of decision makers, and the uncertainty of achieving those, employs the tools of decision and data science in the wisest way. In this discussion, we will think about the principles of high quality decisions, how to frame a business analytics problem, and learn how to use information in the most efficient way to create value and minimize risk.
The discussion will include a demonstration of the Analytica modeling software.

If you're in the Atlanta area, I would love for you to join me in the discussion.

A special thanks to Dr. Beverly Wright for organizing this event!


Interview with Atlanta Business Radio

Recently, Brian McCarthy and I had some fun being interviewed by Ryan McPherson of Atlanta Business Radio.

You can listen to the interview here or here (starts @19:39).

Monday, January 19, 2015

Teaching the Love of Thinking and Discovery

This post is going to be different from what I've published here before. I'm not going explain something or attempt to be clever. Instead, I want to share an idea, an open ended kind of idea for which, at this point, I have no conclusions. First, let me share some background.

The other day I shared a TED Talk by Conrad Wolfram ("Teaching kids real math with computers") as an update on LinkedIn and on my personal Facebook. Please take the time to listen to this if you have not already. I think this is actually vitally important to the well being of our children and how they gain an education.

My friend and colleague, James Mitchell, made the following comment on the original update: "A great talk. My daughter's life would have been so much easier and better with this approach to teaching math. Wolfram talked about all her complaints." They were my complaints, too. A few of the comments made on my Facebook page included "Math is hard" and "I hate math. I never use it." Apparently, the same complaints are shared by more than just two people.

Curiosity photo by Rosemary Ratcliff, provided courtesy of FreeDigitalPhotos.net

I've been thinking about this TED Talk almost non-stop since I watched it, and I'm beginning to think that one way to achieve the idea here is to provide mathematics education outside of traditional school environments. By that, I don't mean that we should advocate that schools quit teaching math; rather, I think we need to start providing private forums in which kids who are interested in math can learn math in the same way they might learn and participate in extracurricular sports or arts activities that are not offered in a traditional school. I'm currently convinced the program must be private and free from policy driven curricula that "teaches to the test" and arbitrary performance criteria. This is for fun, but a special kind of fun.

What if there were mathematics/programming academies that taught math this way? Maybe it would be a private academy for self-motivated kids who want to learn math, maybe offered after their normal school day or on the weekends. It would follow the approaches advocated by Conrad Wolfram, Paul Lockhart, and Kevin Devlin. It would not confer a degree, diploma, or certificate of any sort other than a letter that describes the areas of inquiry and completion of certain milestone projects that were self-selected by the student and mentored by the "professors." For older students, these projects might include publishing papers in journals as well as serve as the submissions to more traditional math and science fair projects. This would not be an after school tutoring program for students who want to improve their grades to passing levels or gain extra points on their college admission tests.

In other words, the immediate purpose of the school would only be to satisfy the natural curiosity of self-motivated students. I believe such an academy would eventually provide economic benefits to its students because it would teach both creative and structured thinking that the market would eventually reward, but the near term benefit would serve to remediate the destruction of natural curiosity created in our current systems and just simply help our youngest achieve what they want to achieve. I envision this as a kind of math zendo where children learn the art driven by intrinsic motivation and encouragement from like-mined but more mature leaders.

Of course, as ideas take hold in our minds, so do the doubts. I think the difficult aspect of this idea would be financing the program. Currently, I see the finances being provided in part by student fees, some voluntary time offered by teachers, and private donations. I would want to structure the student fees such that an interested student could not participate because they could not afford the fees.

Much remains to be considered here. Maybe this has been done before or is being done right now. I don't know. Regardless, I welcome any feedback you might offer.

Wednesday, January 07, 2015

An Interesting Christmas Gift

Over the holidays, the New York Times delivered an unusual juxtaposition of headlines and content, and apparent lack of self-awareness, to illicit such a hearty chuckle from its readers as to make the cheerful Old Saint jealous.


[image originally provided by @ddmeyer on Twitter]

To those imbued with the skill of basic high school Algebra 1, the information in the article about Sony’s revenues for the first four days of release of “The Interview” were enough to solve a unit value problem. If we let R = the number of rentals, and S = the number of sales; then,
  • R + S = 2 million 
  • $6*R + $15*S = $15 million 
With a little quick symbolic manipulation, we see that S = 1/3 million in sales and R = 5/3 million in rentals. That exercise provided just enough mental stimulation and smug self-righteousness to prepare for the day’s sudoku and crossword puzzles. #smug #math

However, not too far into the sudoku puzzle we might realize that a deeper, more instructive problem exists here, a problem that actually permeates all of our daily lives. That problem is related to the precision of the information we have to deal with in planning exercises or, say, garnering market intelligence, etc. A second reading of the article reveals that the sales values, both the total transactions and the total value of them, were reported as approximations. In other words, if the sources at Sony followed some basic rules of rounding, the total number of transactions could range from 1.5 million to 2.4 million, and the total value might range from $14.5 million to $15.4 million. This might not seem like a problem at first consideration. After all, 2 million is in the middleish of its rounding range as is $15 million. Certainly the actual values determined by the simple algebra above point to a good enough approximate answer. Right? Right?

To see if this true, let’s reassign the formulas above in the following way.
  • R + S = T 
  • $6*R + $15*S = V 
where T = total transactions, and V = total value. Again, with some quick symbolic manipulation, we can get the exactly precise answers for R and T across a range of values for T and V.
  • S = 1/9 * V - 2/3 * T 
  • R = T - S 
Doing this we now notice something quite at odds with our intuition - the range of variation between the sales and rentals can be quite large as we see in this scatter plot:



[Fig. 1: The distribution of total transaction values for various combinations of rental and direct sales numbers.]

Here we see that the rental numbers could range from about 800 thousand to 2.4 million, while the direct sales could range from nearly 0 to 700 thousand! Maybe more instructive is to consider the range of the ratio of the rentals to direct sales:


[Fig. 2: The distribution of the ratio of rentals to direct sales for various combinations of rental and direct sales numbers.]

If we blithely assume that the reported values of sales were precise enough to support believing that the actual value of rentals and unit sales were close to our initial result, we could be astoundingly wrong. The range of this ratio could run from about 1.11 (for 1.5 million in total transactions; 15.4 million in sales) to 215 (for 2.4 million in total transactions; 14.5 million in sales). If we were trying to glean market intelligence from these numbers on which to base our own operational or marketing activities, we would face quite a conundrum. What’s the best estimate to use?
Fortunately, we can turn to probabilisitic reasoning to help us out. Let’s say we consult a subject matter expert (SME) who gives us a calibrated range and distribution for the sales assumptions such that the range of each distribution stays mostly within the rounding range we specify.

[Fig. 2a, b: The hypothetical distribution of the (a) total sales transactions and (b) total value assessed by our SME.]

Using the sample values underlying these distributions in our last set of formulas, we observe that in all likelihood - an 80th percentile likelihood – the actual ratio of the rentals to sales falls in a much narrower range – the range of 3 to 9, not 1.11 to 215.

[Fig. 3: The 80th percentile prediction interval for the ratio of the rentals to sales falls in the range of 3 to 9.]

Our manager may push back on this by saying that our SME doesn’t really have the credibility to use the distributions assessed above. She asks, "What if we stick with maximal uncertainty within the range?” In other words, what if, instead of assessing a central tendency around the reported values with declining tails on each side, we assume there is a uniform distribution along the range of sales values (i.e., each value is equally probable to all values in the range)?



[Fig. 4a, b: We replace our SME supplied distribution for (a) total sales transactions and (b) total value with one that admits an insufficient reason to suspect that any value in our range is more likely than any other.]

What is the result? Well, we see that even with the assumption of maximal uncertainty, while the most likely range expands by a factor of 2.7 (i.e., the range expanded from 3-9 to 1.7-18), it still remains within a manageable range as the extreme edge cases are ruled out, not as impossible but as fairly unlikely.

[Fig. 5: Replacing our original SME distributions that had peaks with uniform distributions flattens out the distribution of our ratio of rentals to sales, causing the 80th percentile prediction interval to widen. The new range runs from about 1.7 to 18.]

The following graph displays the full range of sales and rental variation that is possible depending on our degrees of belief (as represented by our choice of distribution) about the range of total transactions and total value.

[Fig. 6: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type.]

By focusing on the 80th percentile range of outcomes in the ratio of rentals to sales, we can significantly improve the credible range to estimate the rentals and direct sales from the approximate information we were given.

[Fig. 7: A scatter plot that demonstrates the distribution of direct sales and rental combinations as conditioned by our choice of distribution type, constrained only to those values in the 80th percentile prediction interval.]

Precise? Not within a hair’s breadth, no, but the degree of precision we obtain by employing probabilities (as opposed to relying on just a best guess with no understanding of the implications of the range of the assumptions) into our analysis improves by a factor of 13.1 (assuming maximum uncertainty) to 35.2 (trusting our SME). If our own planning depends on an understanding of this sales ratio, we can exercise more prudence in the effective allocation of the resources required to address it. Now, when our manager asks, “How do you know the actual values aren’t near the edge cases?”, we can respond by saying that we don’t know precisely, but using simple algebra combined with probabilities dictates that the actual values most likely are not.

The Zen of Decision Making

I copied the following nineteen zen-like koans from the website devoted to the Python programming language (don't leave yet...this isn't really going to be about programming!).
  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren't special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one-- and preferably only one --obvious way to do it.
  • Although that way may not be obvious at first unless you're Dutch.
  • Now is better than never.
  • Although never is often better than *right* now.
  • If the implementation is hard to explain, it's a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea -- let's do more of those!

The koans are supposed to communicate the essence of the guiding principles of programming. Their zen-like fashion is intended to motivate reflection and discussion more so than state explicit rules. In fact, there is a twentieth unstated (Or is it? How's that for zen-like clarity?) principle that you must discover for yourself.



Good aphorisms often find meaning beyond their initial intent. That's the way general, somewhat ambiguous guidance works and why some aphorisms last for so long in common parlance. They're malleable to one's circumstances and provide a kind of structure on which to hinge one's thoughts, concerns, and aspirations (I'm pretty sure horoscopes and Myers Briggs work this way). Some of these aphorisms, maybe all of them, struck me as not only useful as guiding principles for programming but also for decision management in general. Seriously. Go back and consider them again, Grasshopper.

So, let me ask you:
  • In what way is decision management like programming?
  • How would you interpret these principles, if at all, for use in the role of decision making?
  • What do you think is the missing principle?

Monday, October 20, 2014

Moar Accuracies

You've probably heard the saying, "It's better to be mostly accurate than precisely wrong." But what does that mean exactly? Aren't accuracy and precision basically the same thing?

Accuracy relates to the likelihood that outcomes fall within a prediction band or measurement tolerance. A prediction/measurement that comprehends, say, 90% of actual outcomes is more accurate than a prediction/measurement that comprehends only 30%. For example, let's say you repeatedly estimate the number of marbles in several Mason jars mostly full of marbles. An estimate of "more than 75 marbles and less than 300 marbles" is probably going to be correct more often than "more than 100 marbles but less than 120 marbles." You might say that's cheating. After all, you can always make your ranges wide enough to comprehend any range of possibilities, and that is true. But the goal of accuracy is just to be more frequently right than not (within reasonable ranges), and wider ranges accomplish that goal. As I'll show you in just a bit, accuracy is very powerful by itself.

Precision relates to the width of the prediction/measurement band relative to the mean of the prediction/measurement. A precision band that varies around a mean by +/- 50% is less precise than one that varies by +/- 10%. When people think about a precise prediction/measurement, they usually think about one that is both accurate and precise. A target pattern usually helps make a distinction between the two concepts.
The canonical target pattern explanation of accuracy and precision.

The problem is that people jump past accuracy before that attempt to be precise, thinking that the two are synonymous. Unfortunately, unrecognized biases can make precise predictions extremely inaccurate, hence the proverbial saying. Jumping ahead of the all too important step of calibrating accuracy is where the "precisely wrong" comes in.

Good accuracy trucks many more miles in most cases than precision, especially when high quality, formal data is sparse. This is because the marginal cost of improving accuracy is usually much less than the marginal costs of improved precision, but the payoff for improved accuracy is usually much greater. To understand this point, take a look again at the target diagram above. The Accurate/Not Precise score is higher than the Not Accurate/Precise score. In practice, a lot of effort is required to create a measurement situation that effectively controls for the sources of noise and contingent factors that swamp efforts to be reasonably more precise. Higher precision usually comes at the cost of tighter control, heightened attention on fine detail, or advanced competence. There are some finer nuances even here in the technical usages of the terms, but these descriptions work well enough for now.

Be careful, though - being more accurate is not just a matter of going with your gut instinct and letting that be good enough. Our gut instinct is frequently the source of the biases that make our predictions look as if we were squiffy when we made them. We usually achieve improved accuracy through the deliberative process of accounting for the causes and sources of the variation (or range of outcome) we might observe in the events we're trying to measure or predict. The ability to do this reflects the depth of expert knowledge we possess about the system we're addressing, the degree of nuances we can bring to bear to explain the causes of variation, and a recognition of the sources of bias that may affect our predictions. In fact, achieving good accuracy usually begins by assessing that we may be biased at all (and we usually are) and why.

Once we've achieved reasonable accuracy about some measurement of concern, it might then make sense to improve our precision of the measurement if the payoff is worth the cost of intensified attention and control. In other words, we only need to improve our precision when it really matters.
[Image from FreeDigitalPhotos.net by Salvatore Vuono.]

Monday, September 22, 2014

Are Your Spreadsheets the Problem?

Mr. Patrick Burns at Burns Statistics (no, not that Mr. Burns) provides an excellent overview for the hidden dangers that lurk in your spreadsheets. Guess what. The problems aren't just programming errors and the potential for their harm, but are errors that are inherent to the spreadsheet software itself. That's right. Before your analysts even make an error, the errors are already built in. Do you know what's lurking in your spreadsheets? Well, do you?

Before you answer that question, ask yourself these:
  1. What quality assurance procedures does our organization employ to ensure that our spreadsheets are free of errors of math, units conversion, and logic? 
  2. What effort does our organization undertake to make sure that the decision makers and consumers of the spreadsheet analysis comprehend the assumptions, intermediate logic, and results in our spreadsheets? 
  3. How do we ensure that spreadsheet templates (or repurposed spreadsheets or previously loved spreadsheets) are actually contextually coherent with the problem framing and subsequent decisions that the spreadsheets are intended to support? 
Each question actually addresses an hierarchically more important level of awareness and intention in our organizations. The first question addresses the simple rules of math and if they are satisfied. The second question addresses the level of agreement that the math/logic coordinates in a meaningful way and is capable of supporting valid and reasonable insights, inferences, or accurate predictions about the system or problem it describes and that everyone understands why. The last question, the most important question, IMHO, addresses whether our analyses point in the right direction of inquiry at all.

My suspicion is that errors of the first level run amok much more than people are willing to admit, but their prevalence is relatively easy to estimate given our knowledge about the rates at which programming errors occur, why they occur, and how they propagate geometrically through spreadsheets. Mr. Burns recommends that the programming language R is a better solution than spreadsheets and easier to adopt than might be currently imagined by your analysts. I agree. I happen to like R a lot, but I love Analytica as a modeling environment more. But the solution to our spreadsheet modeling problems isn't going to be completely resolved by our choice of software and programming mastery of it.

My greater suspicion is that errors of the second and third level are rarely addressed and pose the greatest level of risk to our organizations because we let spreadsheets (which are immediately accessible) drive our thinking instead of letting good thinking determine the structure and use of our spreadsheets. To rid ourselves of the addiction to spreadsheets and their inherent risks, we have to do the hard work first by starting with question 3 and then working our way down to 1. Otherwise, we're being careless at worst and precisely wrong at best.


(Originally published at LinkedIn.)

Thursday, July 17, 2014

When A Picture is Worth √1000 Words

This morning @WSJ posted a link to the story about Microsoft’s announcement of its plans to lay off 18,000 employees. This picture (as captured on my iPhone)...

[click image to enlarge]

...accompanied the tweet, which is presumably available through their paywall link.

While I’m really sorry to hear about the Microsoft employees who will be losing their jobs, I am simply outraged at the miscommunication in the pictured graph. (This news appeared to me first on Twitter, and the seemingly typical response on Twitter is hyperbolic outrage.)

Here’s the problem as I see it: the graph communicates one-dimensional information with two-dimensional images. By doing so, it distorts the actual intensity of the information the reporters are supposed to be conveying in an unbiased manner. In fact, it makes the relationships discussed appear much less dramatic than it actually is.

For example, look at Microsoft’s (MSFT) revenue per employee compared to Apple’s (AAPL). WSJ reports MSFT is $786,400/person; APPL, $2,128,400. The former is 37% of the latter. But for some reason, WSJ communicates the intensity with an area, a two-dimensional measure, whereas intensity is one-dimensional. Our eyes are pulled to view the length of the side of the square as a proxy for the measurement being communicated. The sides of the squares are proportionally equal to √(786,400) and √(2,128,400); therefore, the sides of the squares visually communicate the ratio of the productivity of MSFT:AAPL as 61%. In other words, the chart visually overstates the relative productivity of MSFT's employees compared to that of AAPL's by a factor of 1.62.

If the numbers are confusing there, consider this simpler example. The speed of your car as measured by your speedometer is an intensity. It’s one dimensional. It tells you how many miles (or kilometers, if you’re from most anywhere else outside the US) you can cover in one hour if your car maintains a constant speed. Your speedometer aptly uses a needle to point to the current intensity as a single number. It does not use a square area to communicate your speed. If it did, 60 miles per hour would  look 1.41 times faster than 30 miles per hour instead of the actual 2 times faster that it really is. The reason for this is that the the sides of the squares used to display speed would have to be proportional to the square roots of the speed. The square roots of 60 and 30 are 7.75 and 5.48, respectively.

For your own personal edification, I have corrected the WSJ graph here:

[click image to enlarge]

Do you see, now, how much more dramatic the AAPL employees' productivity is over that of MSFT's?

This may not seem like a big deal to you at the moment, but consider how much quantitative information we communicate graphically. The reason is that, as the cliché goes, a picture is figuratively worth a thousand words. I firmly believe graphical displays of information are powerful methods of communication, and a large part of my professional practice revolves around accurately and succinctly communicating complex analysis in a manner that decision makers can easily consume and digest. But I’m also keenly aware of how analyst and reporters often miscommunicate important information via visual displays, either by design, inexperience, or by trying to be too clever. I see these transgressions all the time in the analyses I’m asked to audit.

The way we communicate information is not just a matter of style for business reporters. We often make prodigious decisions based on information. If information is communicated in a way that distorts the underlying relationships involved, we risk making serious misallocations of scarce resources. This affects every aspect of the nature of our wealth - money, time, and quality of life. The way we communicate information bears fiduciary responsibilities.

For discussion sake I ask,

  1. How often have you seen, and maybe even been victimized by, graphical information that miscommunicates important underlying relationships and patterns?
  2. How often have you possibly incorporated ineffective means of graphically communicating important information? (Pie charts, anyone?)

If you want to learn more about the best ways to communicate through the graphical display of quantitative information, I highly recommend these online resources as a starting point:

Tuesday, February 25, 2014

How Do You Know That? Funny You Should Ask.

During a recent market development planning exercise, my client recognized that his colleagues were making some rather dubious assumptions regarding the customers they were trying to address (i.e., acceptable price, adoption rate, lifecycle, market size, etc.), the costs of development, and costs of support. Although he frequently asked “How do you know that?”, he seemed to face irritation and mild belligerence in reaction from those he asked to justify their assumptions. So, together we devised a simple little routine to force the recognition that assumed facts might be shakier than previously thought.

After bringing the development team members together, we went around the room and asked for a list of statements that each believed to be true that must be true for the program to succeed. We wrote each down as a succinct, declarative statement. Then, after everyone had the opportunity to reflect on the statements, we converted each to a question simply by converting the periods to question marks.

Before Western explorers proved that the Earth is round, ships used to sail right off the assumed edges.

We then asked the team to supply a statement that answered each question in support of the original statement. Once this was completed, we then appended the dreaded question mark to each of these responses. We repeated this process until no declarative answers could be supplied in response to the questions. The cognitive dissonance among the team members became palpable as they all had to start facing the uncomfortable situation that what they once advocated as fact was largely unsupportable. Many open questions remained. More uncertainty reigned than was previously recognized. The remaining open questions then became the basis for uncertainties in our subsequent modeling efforts in which we examined value tradeoffs in decisions as a function of the quality of information we possessed. You probably won’t be surprised to learn that the team faced even more surprises as the implications of their tenuous assumptions came to light.

I am interested to know how frequently you find yourself participating in planning exercises at work in which key decisions are made on the basis of largely unsupported or untested assumptions. My belief is that such events happen much more often than we care to admit.

I would also be interested to know if the previously described routine works with your colleagues to force awareness of just how tenuous many preconceived notions really are. I outline the steps below for clarity.
  1. Write down everything you believe to be true about the issue or subject at hand. 
  2. Each statement should be a single declarative statement. 
  3. Read each out loud, forcing ownership of the statement.
  4. Convert each statement to a question by changing the period to a question mark.
  5. Again, read each out loud as a question, opening the door to the tentative nature of the original statement.
  6. Supply a statement that you believe to be true that answers each question.
  7. Repeat the steps above until you reach a point with each line of statements-questions where you can no longer supply answers.
You might find that using a mind mapping tool such as MindNode or XMind are useful for documenting and displaying the assumptions and branching question/responses. The visual display may serve to help your team see connections among assumptions that were not previously recognized.

Let me know if you try this and how well it works.

Wednesday, January 22, 2014

Can Modeling a Business Work?

A friend on LinkedIn asks, “Can modeling a business work?” I respond:

For now, or at least until The Singularity occurs, the development of business ideas and plans is a uniquely human enterprise that springs from a combination of intuition, goals, and ambitions. That should not mean, however, that we cannot effectively supplement our intuition and planning with aids to management and decision making. While I think human intuition is a very powerful feature of our species, I’m also convinced it can be led astray or corrupted by biases very quickly, particularly amid the complexities that arise as plans turn into real life execution. This is not a modern realization. The origin of the principles of inventory management, civil engineering, and accounting date back to the antiquities. Think of the seagoing merchants of the Phoenicians and the public works building Babylonians and Egyptians. In fact, historians now believe that the actual founder of Arthur Andersen LLP was none other than the blind Venetian mathematician and priest, Luca Pacioli (ca. 1494). That's right - that musty odor that emanates from accounting books is due to their being more than 500 years old.

Luca Pacioli doodling circles out of sheer boredom after a day of accounting. I made up the part about his being blind.

Business modeling is a tool similar to accounting in that it aids our thinking in a world whose complexity seems often to exceed the grasp of our comprehension. I look at the value of modeling a business as a means to stress test both the business plan logic and the working assumptions that drive the business plan. In regard to the business plan logic, we're asking if the business has the potential ability to produce the value we think it can; and in regard to the working assumptions, we're testing how sensitively important metrics (i.e., payback time, break-even, required resources, shareholder value) of the business plan respond to conditions in the environment and controllable settings to which our business plan will be subjected.

Obtaining such insights from modeling a business, business leaders can modify business plans by changing policies about pricing, products/services offered, costs targeted for reduction or elimination, and contingency or risk mitigation plans that can be adopted, etc. 

However, I recommend awareness of at least three caveats with regard to business modeling:
  1. Think of such models as "what-ifs" more so than precise forecasts. Use the "what if" mindset to make a business plan more robust against the things outside your direct control versus using it to justify a belief in guaranteed success. The latter is almost a sure fire approach to failure. 
  2. Always compare more than one plan with a model to minimize opportunity costs. Often times, the best business plans derive from hybrids of two models that show how value can be created and retained for at least two different reasons. 
  3. Avoid overly complex models as much as, maybe more so than, overly simplistic models. Building a requisite model from an influence diagram first is usually the best way to achieve this happy medium before writing the first formula in a spreadsheet or simulation tool. Richer, more complex models that correspond to the real world with the highest degree of precision are usually not useful for a number of reasons:
    • they can be costly to build
    • the value frontier of the insights derived decline relative to the cost to achieve them as the degree of complexity increases
    • they are difficult to maintain and refactor for other purposes
    • they are often used to justify delaying commitment to a decision
    • few people will achieve a shared understanding that is useful for collaborating and execution
A requisite model, on the other hand, should deliver clarity and permit making new and interesting testable predictions or reveal insights about, say, uncertainties, that could be made to work in your favor. Admittedly, though, it takes a lot of practice to achieve this third recommendation, but it should be used as a guiding principle.

Sunday, January 12, 2014

Double, double toil and trouble; Fire burn, and caldron bubble

This was a great article in The Wall Street Journal today.

For me, the key take away point can be summed up in this quote from Prof. Goetzmann: "Once people buy in, they start to discount evidence that challenges them..." I relate this not only to investing decisions in the market, but also to making organizational decisions--investments in capital projects, new strategies, the next corporate buzz. We've all seen or been apart of the exuberant irrationality that leads organizations into malinvestments.

Let's consider the complementary action--saying "no." Against the tendency toward the irrational "yes, Yes, YES!", learning to say "no" is a very important skill to master. It's probably one of the hardest skills to master when people request something from us that makes us feel important and liked.

I think, however, we always need to be aware that many of our initial reactions are often driven by biases. Reactively saying "no," once we've learned to say it and it becomes easy to do, can emerge from the same biases that urge us unreservedly to say "yes." Both incur their costs: missed opportunity, waste, and rework.

The skill more important to learn than saying "no" is acquiring the skill to consider disconfirming evidence, especially when that evidence challenges our dearest assumptions about what is going to make us rich. Let's not be so quick to say "yes" or smug when we say "no." Rather, let's learn the practice of asking,
  • "what information might disabuse me of my favorite assumptions?"
  • "what biases are preventing me from seeing clearly?"
Failing to learn these, we all too often find ourselves concocting a witches' brew.

Tuesday, September 10, 2013

It's Your Move: Creating Valuable Decision Options When You Don't Know What to Do

The followings is the first chapter excerpt from my newly published tutorial.

Business opportunities of moderate to even light complexity often expose decision makers to hundreds, if not tens of thousands, of coordinated decision options that should be considered thoughtfully before making resource commitments. That complexity is just overwhelming! Unfortunately, the typical response is either analysis paralysis or "shooting from the hip," both of which expose decision makers to unnecessary loss of value and risk. This tutorial teaches decision makers how to tame option complexity to develop creative, valuable decision strategies that range from "mild to wild" with three simple thinking tools.


Read more here.

Wednesday, July 24, 2013

RFP Competitive Price Forecasting Engine

Developing a competitive price in response to an RFP is difficult and fraught with uncertainty about competitor pricing decisions. "Priced to Win" approaches often lead to declining margins. Our approach and tool set allow you to develop a most likely price neutral position that helps you focus more attention on providing "intangible" benefits that differentiate your offering in a way that is more valuable to your potential client.

Tuesday, July 23, 2013

Business Case Analysis with R

The following is the first chapter excerpt from my newly published book.

Business Case Analysis with R

A Simulation Tutorial to Support Complex Business Decisions


1.2 Why use R for Business Case Analysis?
Even if you are new to R, you most likely have noticed that R is used almost exclusively for statistical analysis, as it's described at The R Project for Statistical Computing. Most people who use R do not frequently employ it for the type of inquiry which business case analysts use spreadsheets to select projects to implement, make capital allocation decisions, or justify strategic pursuits. The statistical analysis from R might inform those decisions, but most business case analysts don't employ R for those types of activities.

Obviously, as the title of this document suggests, I am recommending a different approach from the status quo. I'm not just suggesting that R might be a useful replacement for spreadsheets; rather, I'm suggesting that better alternatives to spreadsheets be found for doing business case analysis. I think R is a great candidate. Before I explain why, let me explain why I don't like spreadsheets.

Think about how a spreadsheet communicates information. It essentially uses three layers of presentation:
  1. Tabulation
  2. Formulation
  3. Logic
When we open a spreadsheet, usually the first thing we see are tables and tables of numbers. The tables may have explanatory column and row headers. The cells may have descriptive comments inserted to provide some deeper explanation. Failure to provide these explanatory clues represents more a failing of the spreadsheet developer's communication abilities than a failing of the spreadsheet environment, but even with the best of explanations, the emergent pattern implied by the values in the cells can be difficult to discern. Fortunately, spreadsheet developers can supply graphs of the results, but even those can be misleading chart junk.

To understand how the numbers arise, we might ask about the formulas. By clicking in a cell we can see the formulas used, but unfortunately the situation here is even worse than the prior level of presentation of tables of featureless numbers. Here, we don't see formulas written in a form that reveals underlying meaning; rather, we see formulas constructed by pointing to other cell locations on the sheet. Spreadsheet formulation is inherently tied to the structural presentation of the spreadsheet. This is like saying the meaning of our lives should be dependent on the placement of furniture in our houses.

While the goal of good analysis should not be more complex models, a deeper inquiry into a subject usually does create a need for some level of complexity that exceeds the simplistic. But as a spreadsheet grows in complexity, it becomes increasingly difficult to extend the size of tables (both by length of indices that structure them and the number of indicies used to configure the dimensionality) as a direct function of its current configuration. Furthermore, if we need to add new tables, choosing where to place them and how to configure them also depends almost entirely on the placement and configuration of previously constructed tables. So, as the complexity of a spreadsheet does increase, it naturally leads to less flexibility in the way the model can be represented. It becomes crystalized by the development of its own real estate.

The cell referencing formulation method also increases the likelihood of error propagation because formulas are generally written in a quasi-fractal manner that requires the formula to be written across every element in at least one index of a table's organizing structure. Usually, the first instance of a required formula is written within one element in the table; then, it is copied to all the appropriate adjacent cells. If the first formula is incorrect, all the copies will be, too. If the formula is sufficiently long and complex, reading it to properly debug it becomes very difficult. Really, the formula doesn't have to be that complicated or the model that complex for this kind of failure to occur, as the recent London Whale VaR model and Reinhart-Rogoff Study On Debt debacles demonstrated.[1]

All of this builds to the most important failure of spreadsheets -- the failure to clearly communicate the underlying meaning and logic of the analytic model. The first layer visually presents the numbers, but the patterns in them are difficult to discern unless good graphical representations are employed. The second layer, which is only visible unless requested, uses an arcane formulation language that seems inherently irrational compared to the goal of good analysis. The final layer--the logic, the meaning, the essence of the model--is left almost entirely to the inference capability of any user, other than the developer, who happens to need to use the model. The most important layer is the most ambiguous, the least obvious. I think the order should be the exact opposite.

When I bring up these complaints, the first response I usually get is: "ROB! Can't we just eat our dinner without you complaining about spreadsheets again?" But when the population of my dinner company tends to look more like fellow analysts, I get, "So what? Spreadsheets are cheap and ubiquitous. Everyone has one, and just about anyone can figure out how to put numbers in them. I can give my analysis to anyone, and anyone can open it up and read it."

Then I'm logically--no, morally--compelled to point out that carbon monoxide is cheap and ubiquitous, that everyone has secrets, that just about everyone knows how to contribute to the sewage system, that just about everyone can read your diary and add something to it. Free, ubiquitous, and easy to use are all great characteristics of some things in their proper context, but they aren't characteristics that are necessarily universally beneficial.

More seriously, though, I know that what most people have in mind with the common response I receive is the low cost of entry to the use of spreadsheets and the relative ease of use for creating reports (which I think spreadsheets are excellent for, by the way). Considering the shortcomings and failure of spreadsheets based on the persistent errors I've seen in client spreadsheets and the humiliating ones I've created, I think the price of cheap is too high. The answer to the first part of their objection--spreadsheets are cheap--is that R is free. Freer, in fact, than spreadsheets. In some sense, it's even easier to use since the formulation layer can be written directly in a simple text file without intermediate development environments. Of course, R is not ubiquitous, but it is freely available on the internet.

Unlike spreadsheets, R is programming language with the built in capacity to operate over arrays as if they were whole objects, a feature that demolishes any justification for cell-referencing syntax of spreadsheets. Consider the following example.

Suppose we want to model a simple parabola over the interval (-10, 10). In R, we might start by defining an index we call x.axis as an integer series.

x.axis <– -10:10

which looks like this,

[1] -10  -9  -8  -7  -6  -5  -4  -3  -2  -1 0  1  2  3  4  5  6  7  8  9  10

when we call x.axis.

To define a simple parabola, we then write a formula that we might define as

parabola <– x.axis^2

which produces, as you might now expect, a series that looks like this:

>[1] 100  81  64  49  36  25  16  9  4  1  0  1  4  9  16  25  36  49  64  81 100.

Producing this result in R required exactly two formulas. A typical spreadsheet that replicates this same example requires manually typing in 21 numbers and then 21 formulas, each pointing to the particular value in the series we represented with x.axis. The spreadsheet version produces 42 opportunities for error. Even if we use a formula to create the spreadsheet analog of the x.axis values, the number of opportunities for failure remains the same.

Extending the range of parabola requires little more than changing the parameters in the x.axis definition. No additional formulas need be written, which is not the case if we needed to extend the same calculation in our spreadsheet. There, more formulas need to be written, and the number of potential opportunities for error continues to increase.

The number of formula errors that are possible in R is directly related to the total number of formula parameters required to correctly write each formula. In a spreadsheet, the number of formula errors is a function of both the number of formula parameters and the number of cell locations needed to represent the full response range of results. Can we make errors in R-based analysis? Of course, but the potential for those errors is exponentially smaller.

As we've already seen, too, R operates according to a linear flow that guides the development of logic. Also, variables can be named in a way that makes sense to the context of the problem[2] so that the program formulation and business logic are more closely merged, reducing the burden of inference about the meaning of formulas for auditors and other users. In Chapter 2, I'll present a style guide that will help you maintain clarity in the definition of variables, function, and files.

However, while R answers the concerns of direct cost and the propagation of formula errors, its procedural language structure presents a higher barrier to improper use because it requires a more rational, structured logic than is required by spreadsheets, requiring a rigor that people usually learn from programming and software design. The best aspect of R is that it communicates the formulation and logic layer of an analysis in a more straightforward manner as the procedural instructions for performing calculations. It preserves the flow of thought that is necessary to move from starting assumptions to conclusions. The numerical layer is presented only when requested, but logic and formulation are more visibly available. As we move forward through this tutorial, I'll explain more how these features present themselves for effective business case analysis.

1.3 What You Will Learn
This document is a tutorial for learning how to use the statistical programming language R to develop a business case simulation and analysis. I assume you possess at least the skill level of a novice R user.

The tutorial will consider the case in which a chemical manufacturing company considers constructing a new chemical reactor and production facility to bring a new compound to market. There are several uncertainties and risks involved, including the possibility that a competitor brings a similar product online. The company must determine the value of making the decision to move forward and where they might prioritize their attention to make a more informed and robust decision.

The purpose of the book is not to teach you R in a broad manner. There are plenty of resources that do that well now. Rather, it will attempt to show you how to

  • Set up a business case abstraction for clear communication of the analysis
  • Model the inherent uncertainties and resultant risks in the problem with Monte Carlo simulation
  • Communicate the results graphically
  • Draw appropriate insights from the results
So, while you will not necessarily become a power user of R, you will gain some insights into how to use this powerful language to escape the foolish consistency of spreadsheet dependency. There is a better way.

1.4 What You Will Need
To follow this tutorial, you will need to download and install the latest version of R for your particular OS. R can be obtained here. Since I wrote this tutorial with the near beginner in mind, you will only need the base install of R and no additional packages.


Notes
1: You will find other examples of spreadsheet errors at Raymond Panko's website. Panko researches the cause and prevalence of spreadsheet errors.

2: Spreadsheets allow the use of named references, but the naming convention can become unwieldy if sections in an array need different names.


Read more here: Or, if you prefer Amazon or Scribd.