March 8th, 2016 | Categories: programming, R | Tags:

Say you have two vectors in R (These are taken from my tutorial Simple nonlinear least squares curve fitting in R)

xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)

We put these in a data frame with

data = data.frame(xdata=xdata,ydata=ydata)

This looks like this in R

   xdata     ydata
1  -2.00  0.699369
2  -1.64  0.700462
3  -1.33  0.695354
4  -0.70  1.039050
5   0.00  1.973890
6   0.45  2.411430
7   1.20  1.910910
8   1.64  0.919576
9   2.32 -0.730975
10  2.90 -1.420010

Exporting to a .csv file is done using the standard R function, write.csv

write.csv(data,file='example_data.csv')

The resulting .csv file looks like this:

"","xdata","ydata"
"1",-2,0.699369
"2",-1.64,0.700462
"3",-1.33,0.695354
"4",-0.7,1.03905
"5",0,1.97389
"6",0.45,2.41143
"7",1.2,1.91091
"8",1.64,0.919576
"9",2.32,-0.730975
"10",2.9,-1.42001

I don’t want to include the row numbers in my output. To achieve this, we do

write.csv(data,file='example_data.csv',row.names=FALSE)

This gets us a file that looks like this:

 "xdata","ydata"
-2,0.699369
-1.64,0.700462
-1.33,0.695354
-0.7,1.03905
0,1.97389
0.45,2.41143
1.2,1.91091
1.64,0.919576
2.32,-0.730975
2.9,-1.42001

I can also remove the quotes around xdata and ydata with quote=FALSE

write.csv(data,file='example_data.csv',row.names=FALSE,quote=FALSE)

giving the file below

xdata,ydata
-2,0.699369
-1.64,0.700462
-1.33,0.695354
-0.7,1.03905
0,1.97389
0.45,2.41143
1.2,1.91091
1.64,0.919576
2.32,-0.730975
2.9,-1.42001

Changing the separator

Despite the fact that they are asking R to write a comma separated file, some people try to change the separator. Perhaps you’d like to try changing it to a tab for example. The following looks reasonable:

write.csv(data,file='example_data.csv',row.names=FALSE,quote=FALSE,sep="\t")

Although it understands what you are trying to do, R will completely ignore your request!

Warning message:
In write.csv(data, file = "example_data.csv", row.names = FALSE,  :
  attempt to set 'sep' ignored

This is because write.csv is designed to ensure that some standard .csv conventions are followed. It’s trying to protect you against yourself!

In the UK, the convention for .csv files is to use . for a decimal point and , as a separator and that’s the convention that write.csv sticks to. Other countries have a different convention –  they use a , for the decimal point and a ; for the separator. The function write.csv2 takes care of that for you.

If you absolutely must change the separator to something else, make use of write.table instead:

write.table(data,file='example_data.csv',row.names=FALSE,quote=FALSE,sep="\t")

Now, the file will come out like this:

xdata   ydata
-2      0.699369
-1.64   0.700462
-1.33   0.695354
-0.7    1.03905
0       1.97389
0.45    2.41143
1.2     1.91091
1.64    0.919576
2.32    -0.730975
2.9     -1.42001

Further reading: Official write.table documentation in R

February 29th, 2016 | Categories: programming, RSE, walking randomly | Tags:

I sometimes give a talk on basic research software engineering called ‘Is your research correct?’ (slides here). Near the beginning of this talk I refer to what I’ve modestly named ‘Croucher’s Law’

CROUCHER’S LAW
I CAN BE AN IDIOT AND WILL MAKE MISTAKES.

Croucher’s law has a corollary:

YOU ARE NO DIFFERENT!

The idea is that once you accept this aspect of yourself, you can start to adopt working practices to mitigate against it. In the context of programming, it includes things such as automation, version control, adopting testing and so on.

For me, this isn’t just a law for programming — it’s a law that can be applied to every aspect of life. Unlike my parents, for example, I automate the payment of my bills by using direct debit because I know I’ll eventually forget to pay something otherwise.

The genesis of Croucher’s law demonstrate’s its truth. While sat in a talk given by Jos Martin of The Mathworks, he suddenly stopped and said ‘Mike. We need to talk about Croucher’s law!’ before moving to his next slide which had the title ‘Martin’s Law’. It was very similar to ‘mine’ and it turns out that I had seen his talk years before and had subconsciously ripped him off!

The fact that I had forgotten this demonstrates to me that Croucher’s law is the stronger result :)

Other relevant posts from WalkingRandomly

February 8th, 2016 | Categories: programming, python, R, walking randomly | Tags:

While waiting for the rain to stop before heading home, I started messing around with the heart equation described in an old WalkingRandomly post. Playing code golf with myself, I worked to get the code tweetable. In Python:

In R:

I liked the look of the default plot in R so animated it by turning 200 into a parameter that ranged from 1 to 200. The result was this animation:

The code for the above isn’t quite tweetable:

options(warn=-1)
for(num in seq(1,200,1))
{
    filename = paste("rplot" ,sprintf("%03d", num),'.jpg',sep='')
    jpeg(filename)
    x=seq(-2,2,0.001)
    y=Re((sqrt(cos(x))*cos(num*x)+sqrt(abs(x))-0.7)*(4-x*x)^0.01)
    plot(x,y,axes=FALSE,ann=FALSE)
    dev.off()
}

This produces a lot of .jpg files which I turned into the animated gif with ImageMagick:

convert -delay 12 -layers OptimizeTransparency -colors 8 -loop 0 *.jpg animated.gif 
January 14th, 2016 | Categories: RSE | Tags:

Programmer writes documentation like this

ktpng

User reads documentation like this

giphy (1)

January 11th, 2016 | Categories: RSE | Tags:

The Engineering and Physical Sciences Research Council (EPSRC) is the UK’s main agency for funding research in engineering and the physical sciences. In May 2015, the EPSRC made a funding call for a new type of 5-year fellowship – A Research Software Engineering Fellowship which indicated their commitment to the national Research Software Engineering movement. I am very happy to announce that I am one of the 7 successful EPSRC Research Software Engineering fellows.

The title of my fellowship project is Building Capability and Support in Research Software. The project summary is below

“Software is the most prevalent of all the instruments used in modern science” [Goble 2014]. Scientific software is not just widely used [SSI 2014] but also widely developed. Yet much of it is developed by researchers who have little understanding of even the basics of modern software development with the knock-on effects to their productivity, and the reliability, readability and reproducibility of their software [Nature Biotechnology]. Many are long-tail researchers working in small groups – even Big Science operations like the SKA are operationally undertaken by individuals collectively.

Technological development in software is more like a cliff-face than a ladder – there are many routes to the top, to a solution. Further, the cliff face is dynamic – constantly and quickly changing as new technologies emerge and decline. Determining which technologies to deploy and how best to deploy them is in itself a specialist domain, with many features of traditional research.

Researchers need empowerment and training to give them confidence with the available equipment and the challenges they face. This role, akin to that of an Alpine guide, involves support, guidance, and load carrying. When optimally performed it results in a researcher who knows what challenges they can attack alone, and where they need appropriate support. Guides can help decide whether to exploit well-trodden paths or explore new possibilities as they navigate through this dynamic environment.

These guides are highly trained, technology-centric, research-aware individuals who have a curiosity driven nature dedicated to supporting researchers by forging a research software support career. Such Research Software Engineers (RSEs) guide researchers through the technological landscape and form a human interface between scientist and computer. A well-functioning RSE group will not just add to an organisation’s effectiveness, it will have a multiplicative effect since it will make every individual researcher more effective. It has the potential to improve the quality of research done across all University departments and faculties.

My work plan provides a bottom-up approach to providing RSE services that is distinctive from yet complements the top-down approach provided by the EPRSC-funded Software Sustainability Institute.

The outcomes of this fellowship will be:

1. Local and National RSE Capability: A RSE Group at Sheffield as a credible roadmap for others pump-priming a UK national research software capability; and a national Continuing Professional Development programme for RSEs.
2. Scalable software support methods: A scalable approach based on “nudging”, to providing research software support for scientific software efficiency, sustainability and reproducibility, with quality-guidelines for research software and for researchers on how best to incorporate research software engineering support within their grant proposals.
3. HPC for long-tail researchers: ‘HPC-software ramps’ and a pathway for standardised integration of HPC resources into Desktop Applications fit for modern scientific computing; a network of HPC-centric RSEs based around shared resources; and a portfolio of new research software courses developed with partners.
4. Communication and public understanding: A communication campaign to raise the profile of research software exploiting high profile social media and online resources, establishing an informal forum for research software debate.

References

[Goble 2014] Goble, C. “Better Software, Better Research”. IEEE Internet Computing 18(5): 4-8 (2014)

[SSI 2014] Hettrick, S. “It’s impossible to conduct research without software, say 7 out of 10 UK researchers” http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without- software-say-7-out-10-uk-researchers (2014)

[Nature Biotechnology 2015] Editorial “Rule rewrite aims to clean up scientific software”, Nature Biotechnology 520(7547) April 2015

January 4th, 2016 | Categories: matlab | Tags:

I stumbled across a great list of resources about the R programming language recently – a list called awesome-R. The list said it was inspired by awesome-machine-learning which, in turn, was inspired by awesome-PHP. It turns out that there is a whole network of these lists.

I noticed that there wasn’t a list for MATLAB so started the awesome-MATLAB list. Pull Requests are welcome.

December 24th, 2015 | Categories: programming, RSE | Tags:

John D Cook published a great article on automation recently. He discusses the commonly-held idea that the primary reason to automate things is to save time. As anyone who’s actually gone through this process will tell you, this strategy can often backfire and John points to a comic from the ever-wonderful xkcd that illustrates this perfectly.

From XKCD

John suggests that another reason to automate is to save mental energy rather than time and I completely agree! This is a great reason to automate. When you are under pressure to complete a task that has to be done right first time, being able to simply push the big red button and KNOW that it will work is worth a great deal.

Automation as knowledge storage and transfer

Another use of automation is as a way to store and transfer the knowledge of how to get things done.

I work with a huge array of technologies, spending a large part of my working day poring through manuals, documentation, textbooks and google searches figuring out how to do some task, foo.  By the end of the project, I’ll be an expert at doing foo but I know that this expertise won’t last. I’ll soon be moving onto the next project, the next set of technologies and my hard-won knowledge will leak from my brain-cache as quickly as it was filled.

I often find that the fastest way to distill my knowledge of how to do something is to write a script that automates it. It’s often more concise and quicker to write than documentation and is usually useful to me and possibly others. It also serves as a great launching point for relearning the material if ever I revisit this particular set of technologies and tasks.

Automate to improve your processes

Having an automated script also allows others to easily reproduce what I have done. You want what I have? Run this thing and it’s yours. A favour from me to you!

Initially, this looks and feels like an act of pure altruism. I put in a large amount of hard work and someone else benefits. In my experience, however, payback always comes my way when those who use my work give me feedback on how to do it better.

 

December 23rd, 2015 | Categories: general math, just for fun, Maple, math software | Tags:

Some numbers have something to say. Take the following, rather huge number, for example:

185325291040682644803531312384041336595151018761127807725763308064246070395230764956468856341399670487
514610052487586323067575687914642829757636555138456145938430191876551756992329818006401775522301219016
237245425891544032218544390861818271526845858747648909382915665997160517028671058273052955697138350617
856171748990490346558484883522495310587304606877332488244886849690319641412147118669050542398759303832
627672479768452329971883073420877438596419179762421854464516060347269129680634374662501202129049727949
71185874579656679344857677824

This number wants to tell you ‘Happy Holidays’, it just needs a little code to help it out.  In Maple, this code is:

n := 18532529104068264480353131238404133659515101876112780772576330806424607039523076495646885634139967048751461005248758632306757568791464282975763655513845614593843019187655175699232981800640177552230121901623724542589154403221854439086181827152684585874764890938291566599716051702867105827305295569713835061785617174899049034655848488352249531058730460687733248824488684969031964141214711866905054239875930383262767247976845232997188307342087743859641917976242185446451606034726912968063437466250120212904972794971185874579656679344857677824:
modnew := proc (x, y) options operator, arrow; x-y*floor(x/y) end proc:
tupper := piecewise(1/2 < floor(modnew(floor((1/17)*y)*2^(-17*floor(x)-modnew(floor(y), 17)), 2)), 0, 1):
points := [seq([seq(tupper(x, y), y = n+16 .. n, -1)], x = 105 .. 0, -1)]:
plots:-listdensityplot(points, scaling = constrained, view = [0 .. 106, 0 .. 17], style = patchnogrid, size = [800, 800]);

The result is the following plot

Screen Shot 2015-12-23 at 13.03.12

Thanks to Samir for this one!

The mathematics is based on a generalisation of Tupper’s self-referential formula.

There’s more than one way to send a message with an equation, however. Here’s an image of one I discovered a few years ago — The equation that says Hi

December 13th, 2015 | Categories: general math, just for fun, math software, programming | Tags:

Way back in 2008, I wrote a few blog posts about using mathematical software to generate christmas cards:

I’ve started moving the code from these to a github repository. If you’ve never contributed to an open source project before and want some practice using git or github, feel free to write some code for a christmas message along similar lines and submit a Pull Request.

December 8th, 2015 | Categories: Open Data Science | Tags:

Back in October, I wrote about the Open Data Science events we’ve starting running at the University of Sheffield. These evening events, held at Sheffield’s The Hide are attended by researchers, students and the occasional random who have an interest in data science (collectively referred to as Data Hipsters by some).

At their core, these events are just an excuse for researchers from many disciplines to get together and explore common interests in an informal way and they’ve been a great success.

This month, we’ve gone big and not just in the ‘big data’ sense. We have two free data science events: