Tuesday, 19 August 2014

googleVis 0.5.5 released

Earlier this week we released googleVis 0.5.5 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R. This is mainly a maintenance release, updating documentation and minor issues.

Screen shot of some of the Google Charts

New to googleVis? Review the examples of all googleVis charts on CRAN.

Perhaps the best known example of the Google Chart API is the motion chart, popularised by Hans Rosling in his 2006 TED talk.

R Code

Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] googleVis_0.5.5 WDI_2.4 RJSONIO_1.3-0  

loaded via a namespace (and not attached):
[1] tools_3.1.1

Tuesday, 12 August 2014

GrapheR: A GUI for base graphics in R

How did I miss the GrapheR package?

The author, Maxime Hervé, published an article about the package [1] in the same issue of the R Journal as we did on googleVis. Yet, it took me a package update notification on CRANbeeries to look into GrapheR in more detail - 3 years later! And what a wonderful gem GrapheR is.

The package provides a graphical user interface for creating base charts in R. It is ideal for beginners in R, as the user interface is very clear and the code is written along side into a text file, allowing users to recreate the charts directly in the console.

Adding and changing legends? Messing around with the plotting window settings? It is much easier/quicker with this GUI than reading the help file and trying to understand the various parameters.

Here is a little example using the iris data set.
This will bring up a window that helps me to create the chart and tweak the various parameters.

Once I am happy with my configuration I hit DRAW and R will create the chart for me.

Finally, I find the underlying R code in a file created by GrapheR. For more details read also the package vignette, which is available in English, French and German!

R code

Tuesday, 5 August 2014

Thanks to R Markdown: Perhaps Word is an option after all?

In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to include mathematical notations and R output into Word.

I have been using R Markdown for a while now and have grown very fond of it. Although I am quite happy with PDF and HTML output for basic reports and to switch to Sweave/LaTeX for more complex documents, I was pleasantly surprised to learn that the new version of RStudio can produce MS Word files directly from R Markdown as well; thanks to the power of pandoc. Perhaps Word is an option after all?

Tuesday, 29 July 2014

Hit and run. Think Bayes!

At the R in Insurance conference Arthur Charpentier gave a great keynote talk on Bayesian modelling in R. Bayes' theorem on conditional probabilities is strikingly simple, yet incredibly thought provoking. Here is an example from Daniel Kahneman to test your intuition. But first I have to start with Bayes' theorem.

Bayes' theorem

Bayes' theorem states that given two events \(D\) and \(H\), the probability of \(D\) and \(H\) happening at the same time is the same as the probability of \(D\) occurring, given \(H\), weighted by the probability that \(H\) occurs; or the other way round. As a formula it can be written as:
P(H \cap D) = P(H|D) \, P(D) = P(D|H) \, P(H)
Or if I rearrange it:
P(H|D) = \dfrac{P(D|H) \, P(H)}{P(D)}
Imagine \(H\) is short for hypothesis and \(D\) is short for data, or evidence. Then Bayes' theorem states that the probability of a hypothesis given data is the same as the likelihood that we observe the data given the hypothesis, weighted by the prior belief of the hypothesis, normalised by the probability that we observe the data regardless of the hypothesis.

The tricky bit in real life is often to figure out what the hypothesis and data are.

Hit and run accident

This example is taken from Daniel Kahneman's book Thinking, fast and slow [1].
A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colours 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?

What is here the data and what is here the hypothesis? Intuitively you may think that the proportion of Blue and Green cabs is the data at hand and the witness accusation that a Blue cab was involved in the accident is the hypothesis. However, after some thought I found the following assignment much more helpful, as then \(P(H|D)\) matches the above question:

\(H =\) Accident caused by Blue cab. \(D =\) Witness said the cab was Blue.

With this it is straightforward to get the probabilities of \(P(H)=15\%\) and \(P(D|H)=80\%\). But what is \(P(D)\)? Well, when would the witness say that the cab was Blue? Either, when the cab was Blue and so the witness is right, or when the cab was actually Green and the witness is incorrect. Thus, following the law of total probability:
P(D) & = P(D|H) P(H) + P(D | \bar{H}) P(\bar{H})\\
& = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.29
\end{align}$$Therefore I get \(P(H|D)=41\%\). Thus, even if the witness states that the cab involved in the accident was Blue, the probability of this being true is only \(41\%\).

An alternative way to think about this problem is via a Bayesian Network. The colour of the cab will influence the statement of the witness. In R I can specify such a network using the gRain package [2], which I discussed in an earlier post. Here I provide the distribution of the cabs and the conditional distribution of the witness as an input. After I compile the network, I can again read off the probabilities that a Blue cab was involved, when the witness said so.

R code

Tuesday, 22 July 2014

Notes from the 2nd R in Insurance Conference

The 2nd R in Insurance conference took place last Monday, 14 July, at Cass Business School London.

This one-day conference focused once more on applications in insurance and actuarial science that use R. Topics covered included reserving, pricing, loss modelling, the use of R in a production environment and more.

In the first plenary session, Montserrat Guillen (Riskcenter, University of Barcelona) and Leo Guelman (Royal Bank of Canada, RBC Insurance) spoke about the rise of uplift models. These predictive models are used for improved targeting of policyholders by marketing campaigns, through the use of experimental data. The presenters illustrated the use of their uplift package (available on CRAN), which they have developed for such applications.

Thereafter, the programme consisted of a combination of contributed presentations and lightning talks, as well as a panel discusson on R at the interface of practitioner / academic interraction. The panel, drawn from academia and practice, discussed the efforts made in bridging through the use of R cultural and communication divides, as well as the challenges of developing collaborative business models that respond to market needs and the incentives of academic researchers.

In the closing plenary, Arthur Charpentier (Professor of Actuarial Science at UQAM, Canada) gave a non-Bayesian's account of Bayesian modelling in R. While many are sympathetic to the Bayesian paradigm, it is easy access to computational tools that makes its wider application a realistic prospect. The presenter demonstrated how Bayesian methods can be used to offer alternative analyses of standard actuarial problems.

The audience of the conference included both practitioners (70%) and academics (30%) who are active or interested in the applications of R in Insurance. It was a truly international event with speakers and delegates from many different countries, including USA, Canada, Belgium, Netherlands, Switzerland, Germany, Ireland, Argentina, France, Spain and of course the UK. The coffee breaks and conference dinner at Ironmongers Hall offered great networking opportunities.

All conference presentations are available on request.

Finally, we are grateful to our sponsors Mango Solutions, CYBAEA, PwC and RStudio. This conference would not have been possible without their generous support.

R in Insurance 2015

We are delighted to announce next year's event already. Following two years in London at Cass Business School, the conference will travel across the Channel to Amsterdam, 29 June 2015.

We are looking forward to seeing you there. Further details will be published on www.rininsurance.com.

Tuesday, 15 July 2014

Simple user interface in R to get login details

Occasionally I have to connect to services from R that ask for login details, such as databases. I don't like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code.

Fortunately, I found some old code in a post by Barry Rowlingson that does just that. It uses the tcltk package in R to create a little window in which the user can enter her details, without showing the password. The tcltk package is part of base R, which means the code will run on any operating system. Nice!

Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] tcltk stats graphics grDevices utils datasets methods base

Tuesday, 8 July 2014

googleVis 0.5.3 released

Recently we released googleVis 0.5.3 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R.

Screen shot of some of the Google Charts

Although this is mainly a maintenance release, I'd like to point out two changes:
  • Default chart width is set to 'automatic' instead of 500 pixels.
  • Intervals for columns roles have to end with the suffix ".i", with "i" being an integer. Several interval columns are allowed, see the roles demo and vignette for more details.
Those changes were required to fix the following issues:
  • The order of y-variables in core charts wasn't maintained. Thanks to John Taveras for reporting this bug.
  • Width and height of googleVis charts were only accepted in pixels, although the Google Charts API uses standard HTML units (for example, '100px', '80em', '60', 'automatic'). If no units are specified the number is assumed to be pixels. This has been fixed. Thanks to Paul Murrell for reporting this issue.
New to googleVis? Review the demo on CRAN.