March 2014 – infinite degrees of freedom

Transforming a data.frame from R into a table in LaTeX

Inserting a table into LaTeX can sometimes be a very tedious job, especially when the table includes many data. But there is hope. A very easy way to transform a data.frame into LaTeX code is to use the xtable package. And here is how it works:

1) Let us create a random data frame to work with:

```{r}
data <- data.frame(
 base=sample(c('A','T','C','G'), size=25, replace=T), present=sample(c('T','F'), size=25, replace=T))
data
```

2) We need to load the xtable package

```{r}
install.packages('xtable')
library(xtable)

3) Now we simply use the xtable() function to create the LaTeX code for our table

```{r}
xtable(data)
```

4) It produces the ready to use LaTeX code that we can imediately copy and paste into our LaTeX file. xtable also offers to add information like the caption and a label or how many digets numbers should have. Let’s try this out and add a further column with random numbers to it.

```{r}
data2 <- data
data2$coefficient <- rnorm(n=25,mean=4,sd=1)
data2
```

xtable_3 — xtable returns LaTeX code (without the ## at the beginning of the line) right inside your R console. Just copy it into a LaTeX document.

This is our new data frame. And now we create the LaTeX code including a caption and label (the label by the way is what you use in your text to refer to.)

```{r}
xtable(data2, caption='A table with important results', label='tab:important_results', digits=3)
```

xtable_4 — This is what the table looks like after pasting the code into a simple LaTeX document.

Nice. The only problem is that xtable puts the caption below the table but in scientific work the caption of a table is always above the table (opposite for figures). Here is the pdf to this example: xtable_example

Combining your R code with text elements – Using RMarkdown

Two weeks ago I presented in the weekly R Group Meeting of the Faculty of Life Sciences (University of Manchester) how to create RMarkdown files. These are files that combine calculation and plots from R with LaTeX-esque text objects. Why would you need to bother? Because it is a stunningly fast way to share your work and results from within R with your colleagues and supervisor. Furthermore, it sets an end to the time where I created a graph or calculated a p-value and lost track which R file contained the original commands. No more searching, you will have it all in place. And here is how it works (at least for Linux and MacOS, tell me if there are problems with Windows)

If you have the program RStudio and the R package knitr (just type in R install.packages(‘knitr’)) installed we we are ready to start.

Open RStudio and click on the new file icon
Select R Markdown to create a new Markdown file
RStudio will now open a template file. It already has a title some text and so called R chunks, which include R code as you would usually have in your R files.
Add and change text and insert your R code (you can of course run the code in line or whole chunks). By the way, you find helpful information by clicking at the question mark icon in the top option bar of the source window (where you are currently entering your text). Click on Markdown Quick Reference to get an overview of useful commands. Click on Using R Markdown to get the full introduction from the RStudio website.
If you are satisfied with the text and the data save the file in a new folder (I recommend a folder because the knitting process ends usually with a .Rmd, .md, and .html file, and maybe more files, but see below).
Finally, click on Knit HTML to create your RMarkdown HTML file. The compiler will go through your text and your R chunks and create a Markdown file (.md) which it uses to return a HTML file.

Okay, let’s have a look how all these file may look like in the end. Copy and paste the following code in your source window and knit it to see what the result looks like.

The Gamma Distribution
========================================================
In _probability theory_ and statistics, the **gamma distribution** is a two-parameter family of continuous probability distributions.
The gamma distribution is described by a shape parameter $k$ and a scale parameter $theta$. Its variance is $Var(X)=k*theta^2$ and its mean is $E(X)=k*theta$. The 'r 1+1'functions look like this in R:
'''{r}
var_gamma <- function(k,t) return(k*(t^2))
mean_gamma <- function(k,t) return(k*t)
```
The next graph shows how the distribution changes with a chaging shape ($k$) value.
```{r, echo=FALSE, fig.width=10, fig.height=5}
plot(sort(rgamma(n=10000,shape=(K <- c(0.1,0.5,2,5,11)),scale=0.4)), type=‘l’, col=1, lwd=2, ylim=c(0,10), xlab=‘sorted values’, ylab=‘’)
for(k in K[-1]){
lines(sort(rgamma(n=10000,shape=k,scale=0.4)), lwd=2)
}
```

The resulting HTML file shout look like this:

It is a minimal example of what you can do. We have there: a title, words in italic and bold letters, and even inline LaTeX code (it is flanked by dollar signs, find possible LaTeX symbol code here). I also included (obviously obsolete but for proof of principle) in line R code. It starts with `r and ends with `. In-between you can write whatever R code you want. You might want to calculate the sum or the mean of a data column or maybe a mean in combination with the standard deviation. You can use the following to do this:

'''{r}
DF <- data.frame(first=runif(10,0,1), second=runif(10,1,2))
DF
'''
The mean value of the first experiment is $'r mean(DF$first)' pm 'r sd(DF$first)'$.

Awesome, you get your results in line without writing any number down. This means also, that this is a very useful tool if you have report that you have to write repeatedly and only the numbers change. Just update the raw data and the report changes itself the next time you knit it. (Note, I combined LaTeX and R in-line code in the above example, yes it is possible).

You may find, that next to the introducing “`{r} there are sometimes further attributes like “`{r setup, echo=FALSE, results=’hide’}. This line for example means the following:

the chunk is called ‘setup’
the code that you entered is not ‘echoed’ in the final document
and all results are ‘hide’ in the final document

You might use this combination when you for example just want to load some function into your file that no-one else should see. The code is still evaluated. If you have code that you don’t want to evaluate (maybe it takes a long time to calculate or you don’t care for it at the moment) just add eval=FALSE. During the knitting process this chunk will not be evaluated.

Similarly, if you have a chunk that you only want to calculate once and want R to remember the results use cache=TRUE. This will save the results in a folder called ‘cache’. Every chunk with the argument cache=TRUE will be saved individually. As long as you don’t change the code in the chunk R will not calculate it again.

If you think that some of the graphs that you created during knitting would fit perfectly in another document, just go to the folder that you created for this document. You will find there a folder called figure, which contains all figures that R created during knitting as png files.

Finally, once you finished polishing your file and you want share it with your supervisor or colleague, it is enough to send the HTML file. All images are part of this HTML file due to the so called URI Scheme.

Now, enjoy having all your code, text and results stored in one place.

Note: In this article I used ‘ instead of ` which is the correct notation, but wordpress always deletes them. Just go for ` or “` instead of ‘ or ”’ and you’ll be fine.

Voice of Young Scientists Workshop

On Friday March 14th the group Voice of Young Scientists (@voiceofyoungsci, which belongs to Sense about Science @senseaboutsci) held a workshop to encourage young researches to communicate their work to the public.

The first session started with Matthew Cobb (@matthewcobb), Susanne Shultz (@Susanne_Shultz) and Jeffrey Forshaw (@jrf1968) three successful scientists that presented a very interesting insight into their relations with public media and journalists.

It is not clear to me whether Jeffrey Forshaw, particle physicist at the University of Manchester, directed his comment ‘scientists are just ordinary people who are curious and want to understand’ towards the much younger scientists or the journalists in the room. However, he made the important point that it is necessary for every researcher to know its stuff. Only when you feel comfortable in your field you can freely talk about it. Also, never try to hide your work. The only way to counter prejudice against scientific work is to be pro-active. Don’t wait for the public to ask you what you do, just start to tell them. And let me add here: even when they don’t ask.

This point was also supported by Matthew Cobb, evolutionary biologist at the University of Manchester. Being pro-active is especially important when you work in controversial areas, like gene modification, medical research, or even evolutionary biology. Writing about your and others work is good for at least two reasons. For one, you train how to write. And the other, your field becomes more understandable by the public and thus less suspicious of doing ‘bad things’.

There's a responsibility to communicate your work so share it widely, and include the human interest angle @dderbyshire #VoYSmediaworkshop

— VoiceofYoungScience (@voiceofyoungsci) March 14, 2014

If you finished your work on a project with a great paper, let your press officer now. They are your best link to all kinds of media and they will also help you to make the points you intended. Susanne Shultz, like Cobb evolutionary biologist at the University of Manchester, recommends to write a press release. Looking towards the journalists, she mentions that it is a lot faster for them to copy-paste an article than to write one from scratch.It might also prevent misinterpretations.

‘I assume’, says Forshaw, ‘media gets it always wrong.’ Therefore, be careful what you say and how you say it. Never try to oversimplify your work, but also don’t oversell it conclude Cobb and Forshaw. And most importantly, never let journalists push you to make a specific comment or statement. Never be afraid to embrace the three simple words that form the foundation of science: ‘I don’t know’ (or if you prefer ‘We don’t know’).

https://twitter.com/voiceofyoungsci/status/444450102057631744

The second session brought BBC journalist Victoria Gill (@Vic_Gill) and freelancing journalist David Derbyshire (@dderbyshire). Their description of their daily work added an interesting angle to the panel in the morning. As representatives of different media (online, telly, print) they are interested novelties and surprises. If you want a big story talk about something big, or as Derbyshire explains it, ‘Of course an elephant is more interesting than a cat, which is more interesting than a mouse, and so on.’

Both Gill and Derbyshire point out that you shouldn’t be afraid of contacting journalists. They know what makes a good story. They are the professional story tellers and are glad to work together with whoever has something interesting to offer. Whether they talk to a PhD student, who actually does the job, or to its professor doesn’t really matter for them. Both are equally unknown to them. ‘As long as they don’t try to take over the editorial part’, states Gill.

"Some science correspondents have a science background… but they do their best to overcome that" All good fun at the #VoYSmediaworkshop

— VoiceofYoungScience (@voiceofyoungsci) March 14, 2014

This was a very interesting and refreshingly different workshop. Many thanks to the organisers and all participants, especially the speakers that took their time and afford to share their experience.

Another blogger blogged about the event here.