Tools useful in data science programming

In preparation for a data science class at Stanford I've been learning more about the programming languages and tools used in that environment. Three of these (described in the following paragraphs) are SAS (Statistical Analysis Software), R, and Python.

I used SAS in the 1980s to analyze and display SMF (System Management Facility) data generated by the IBM MVT and SVS operating systems in use at SLAC (Stanford Linear Accelerator Center), where I was a systems programmer. In those days SAS was available only for the IBM mainframe. I found SAS easy to use for the simple my relatively simple projects, mostly histograms and basic statistics on batch job turnaround and activity of interactive users. SAS products still exist, but they carry a high price tag.

More recently I have been experimenting with the R programming language and with the pandas and matplotlib modules for Python 3.5. Both of these tools have the advantage of being free and also available on all the modern OSs. I have been using the MacOS versions of both. R is a domain-specific language for statistics, and Python is a general-purpose programming language in wide use.

I have found that if I confine myself to simple plotting and statistics tasks, R is easy enough to use, but the full programming language is not to my liking. This is probably because I already knew Python very well, so it is much easier for me to do my work with Python and familiar modules than to learn another programming language just to do data science.

For an introduction to R programming, I recommend R for Dummies, a Wiley Brand, by Andrie de Vries and Joris Meys.