Signup/Sign In
Ask Question
Not satisfied by the Answer? Still looking for a better solution?

How to organize large R programs?

When I undertake an R project of any complexity, my scripts quickly get long and confusing.

What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like

Placement of functions in source files
When to break something out to another source file
What should be in the master file
Using functions as organizational units (whether this is worthwhile given that R makes it hard to access global state)
Indentation / line break practices.
Treat ( like {?
Put things like )} on 1 or 2 lines?
Basically, what are your rules of thumb for organizing large R scripts?

2 Answers

I like putting different functionality in their own files.

But I don't like R's package system. It's rather hard to use.

I prefer a lightweight alternative, to place a file's functions inside an environment (what every other language calls a "namespace") and attach it. For example, I made a 'util' group of functions like so:

util = new.env()

util$bgrep = function [...]

util$timeit = function [...]

while("util" %in% search())

This is all in a file util.R. When you source it, you get the environment 'util' so you can call util$bgrep() and such; but furthermore, the attach() call makes it so just bgrep() and such work directly. If you didn't put all those functions in their own environment, they'd pollute the interpreter's top-level namespace (the one that ls() shows).
It gives you

a quasi-automatic way to organize your code by topic
strongly encourages you to write a help file, making you think about the interface
a lot of sanity checks via R CMD check
a chance to add regression tests
as well as a means for namespaces.
Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

Login / Signup to Answer the Question.