I was pretty intrigued when I read this blog post about the new “Pro” version of Wolfram Alpha, with passages like:
The key idea is automation. The concept in Wolfram|Alpha Pro is that I should just be able to take my data in whatever raw form it arrives, and throw it into Wolfram|Alpha Pro. And then Wolfram|Alpha Pro should automatically do a whole bunch of analysis, and then give me a well-organized report about my data. And if my data isn’t too large, this should all happen in a few seconds.
And what’s amazing to me is that it actually works.
So I signed up for an account; at $5 a month (introductory price), I would have been willing to pay for a few months just to try it out, but as it happens, they also have a free 2-week trial which I duly activated. Now I was looking forward to those cool automatic PCA plots and linear regression auto-magically appearing upon uploading my data …
The first letdown is that there is a 1-megabyte limit to the data upload, so I guess we can safely say that Wolfram Alpha Pro is not a “big data” thing … Joking aside, 1 Mb is really not enough to make this service anything more than a toy analytics sandbox; at least for me, I would need to subsample practically all of the datasets I work with to even be able to upload the data for analysis.
Still: the output shown in the blog looked cool, so I tried to upload a few files, with the following results:
1. A CSV file that came from screen-scraping some Kaggle leaderboards for a visualization I was planning to do with Joel but which we never bothered to finish. This CSV file is a bit “dirty” (lots of missing values, some rows are longer than others) so I wasn’t expecting a clean import, but what actually happens is that Wolfram Alpha Pro just gets stuck for ever, with a window saying “Processing file.csv.” I’ve tried this three times now with the same result. This is a bit annoying; if the input isn’t looking clean, it would be more useful to get an error message telling you that the data could not be processed.
2. A CSV file from an old data set about gene expression in neural stem cells which I used to work on. This is a perfectly ordinary CSV file with a regular matrix structure, a header line, and no missing values; it can be readily imported into R without errors, for instance (read.csv(“file.txt”) works fine). However, after uploading, I get the message “Wolfram|Alpha doesn’t know how to interpret your input.”
3. OK, so no luck with the CSV files, let’s try a tab separated file. This time I tried a table of protein complex abundance values in a certain type of cancer, something we are working on for a paper. It was successfully imported into Wolfram Alpha, but the “analysis” I got had completely missed that it was a numeric data set, and only gave me information about word counts, character frequencies, frequency of capitalized words etc. The structure of the file is that it starts with two comment lines (beginning with “#”, as is the custom), after which all lines are tab separated with a numeric ID in the first column, a complex name (which can consist of several words) in the second column, followed by eight columns with numeric values (complex abundances in different sets of tissues). Apparently the second column is enough to throw off the system.
4. Last try. Let’s give it something more standard: a tab-separated file with a header line and the other lines consisting of a cell ID in the first column followed by all numerical values. That is, the first column consists of (one-word) ID’s; the rest is numeric. (This is from an old project on single-cell gene expression.) This is a very common way to format tables in flat text files. But again, I get a textual analysis with overrepresented words etc. I guess I need to remove the ID column. (*removing the ID column*) No, it didn’t help, I still get the textual analysis although all of my values except the first line (column names) are numerical.
I’m sure I’m doing something wrong, but the point is I shouldn’t need to worry about these things, given what the product claims to be able to do …
Still looking forward to exploring Wolfram Alpha Pro when I’ve figured out what formats it can work with!
P.S. Follow the Data will be launching a podcast series in a few weeks – stay tuned! We’re very excited about that.