Follow the Data

A data driven blog

Archive for the tag “data-science”

Finnish companies that do data science

I should start by saying that I have shamelessly poached this blog post from a LinkedIn thread started by one Ville Niemijärvi of Louhia Consulting in Finland. In my defence,  LinkedIn conversations are rather ephemeral and I am not sure how completely they are indexed by search engines, so to me it makes sense to sometimes highlight them in a slightly more permanent manner.

Ville asked for input (and from now on I am paraphrasing and summarising) on companies in Finland that do data analytics “for real”, as in data science, predictive analytics, data mining or statistical modelling. He required that the proposed companies should have several “actual” analysts and be able to show references to work performed in advanced analytics (i e not pure visualization/reporting). In a later comment he also mentioned price optimization, cross-sell analysis, sales prediction, hypothesis testing, and failure modelling.

The companies that had been mentioned when I went through this thread are listed below. I’ve tried to lump them together into categories after a very superficial review and would be happy to be corrected if I have gotten something wrong.

[EDIT 2016-02-04 Added a bunch of companies.]

Louhia analytics consulting (predictive analytics, Azure ML etc.)
BIGDATAPUMP analytics consulting (Hadoop, AWS, cloud etc.)
Houston Analytics analytics consulting (analytics partner of IBM)
Top Data Science analytics and IT consulting
Gofore IT architecture
Digia IT consulting
Techila Technologies distributed computing middleware
CGI IT consulting, multinational
Teradata data warehousing, multinational
Avanade IT consulting, multinational
Deloitte financial consulting, multinational
Information Builders business intelligence, multinational
SAS Institute analytics software, multinational
Tieto IT services, multinational (but originally Finnish)
Aureolis business intelligence
Olapcon business intelligence
Big Data Solutions business intelligence
Enfo Rongo business intelligence
Bilot business intelligence
Affecto digital services
Siili digital services
Reaktor digital services
Valuemotive digital services
Solita digital services
Comptel digital services?
Dagmar marketing
Frankly Partners marketing
ROIgrow marketing
Probic marketing
Avaus marketing
InlineMarket marketing automation
Steeri customer analytics
Tulos Helsinki customer analytics
Andumus customer analytics
Avarea customer analytics
Big Data Scoring customer analytics
Suomen Asiakastieto credit & risk management
Silta HR analytics
Quva industrial analytics
Ibisense industrial analytics
Ramentor industrial analytics
Indalgo manufacturing analytics
TTS-Ciptec optimization, sensor
SimAnalytics Logistics, simulation
Relex supply chain analytics
Analyse2 assortment planning
Genevia bioinformatics consultancy
Fonecta directory services
Monzuun analytics as a service
Solutive data visualization
Omnicom communications agency
NAPA naval analytics, ship operations
Primor consulting telecom?

There was an interesting comment saying that CGI manages its global data science “virtual team” from Finland and that they employ several successful Kagglers, one of whom was rated #37 out of 450000 Kaggle users in 2014.

On a personal note, I was happy to find a commercial company (Genevia) which appears to do pretty much the same thing as I do in my day job at Scilifelab Stockholm, that is, bioinformatics consulting (often with an emphasis on high throughput sequencing), except that I do it in an academic context.

 

 

 

Coursera Introduction to Data Science Course

I promised in an earlier blog post to report back on the Introduction to Data Science course given by Coursera. Paradoxically, I didn’t finish it although I think it was the best of the three online data-related courses that I started this year (the others were Data Analysis on Coursera and Introduction to Data Science at Syracuse University). I think my non-finishing was related to some sort of MOOC fatigue and just the fact that I had too much going on. Also, the descriptions of the last mandatory assignments (Tableau and Kaggle) were a bit too vague for my schedule (I finished all the quizzes and the programming assignments). Enough excuses – I think this was an excellent course which covered a lot of ground, from SQL via Map/Reduce to machine learning. In particular, the Map/Reduce programming assignments (which used a Python Map/Reduce library) were helpful to me. Highly recommended (and yes, I’ll try to finish it up next year).

Online course experiences: Coursera Data Analysis and Syracuse U. Data Science

I’ve been following two online data analysis related courses during the past few months: the Data Analysis course given by Johns Hopkins U. through Coursera and the Introduction to Data Science course given by Syracuse U.  through Coursesites.

The Data Analysis course is the third one that I have enrolled in on Coursera, and the first one where I have completed all the coursework (I received my statement of accomplishment this past weekend, yippee!). Of the two previous courses I had enrolled in, I had tried to follow one but given up because of problems with the platform incorrectly grading the quizzes – a childish thing to quit a course over, because it’s the things you learn that should matter, but I felt that the weird grading made me uncertain about what parts of the material I had really understood.

I think the Data Analysis course was quite good, because it focused not only on R and statistics (which is great) but also on more practical aspects of data analysis, like how you might organize your files and write up a good analysis report. It introduced me to things like R markdown and knitr, which I had heard about but not used until now. The course contents were also surprisingly up to date, with things like the medley R package being included in the video lectures. This package, which was developed by a Kaggle competitor for constructing ensemble models more easily, was first mentioned in January 2013 on a Kaggle forum and doesn’t yet exist as an R package, yet it was covered in the course with nice examples of how to run it!

There is a “post-mortem” podcast at Simply Statistics where Jeff Leek (the main instructor of the course) and Roger Peng discuss what went right and what went wrong.

The course videos are on YouTube and course lectures are available on GitHub; both videos and lectures are tagged by week. Some numbers on participation given by Jeff Leek:

There were approximately 102,000 students enrolled in the course, about 51,000 watched videos, 20,000 did quizzes, and 5,500 did/graded the data analysis assignments.

Personally, I would perhaps have liked the contents to be slightly more difficult (because I came in with a fair amount of subject knowledge) but on the other hand the given level of difficulty let me get away with spending 3-5h per week on average on the course, as advertised. I think many students used a lot more.

The other course that I participated in, Introduction to Data Science from Syracuse University, was similar to the Data Analysis course in a way, specifically, in that it used R the vehicle for introducing statistical concepts. However, this course was much more limited in scope and basically did not assume that the students had had any prior exposure to statistics or programming. I felt that this was a mismatch for me and in the end did not finish all of the coursework. I did read the accompanying textbook which, in parts, did a very good job of explaining the value of data analysis in real-world scenarios. I felt that the course would be most useful for people who are curious about “big data” and “data science” and want to dip their toes into it a little bit but not necessarily work with data analysis. Maybe this was the intention.

Foolhardy as I am, I plan to take another MOOC data science course beginning in May, namely Introduction to Data Science. I’ll report back here afterwards!

 

Open data scientist positions in Stockholm

I noticed there are several “data scientist” type positions being advertised at the moment in Stockholm. Here is a sampling based on what I was able to find easily.

Data scientist at Wrapp 

We are now looking to expand our technical team with a hands-on data analyst/scientist. You will be both analyzing the data and building the systems to collect, store, slice and dice the data in tight co-operation with our back- and frontend teams.

You likely have a computer science or physics degree, a deep understanding of statistics – an area too frequently missed by software engineers – and an insatiable curiosity. You should also be able to clearly communicate your findings in English as well as using charts, diagrams and numbers.

Experience with web analytics software, SQL databases, Hadoop and Hive, R and Python scripting will certainly help.

Data Analysis Team Lead at AlaTest

You will build and extend full stack solutions, from Opinion mining and Sentiment analysis, with ETL and event-driven dataflows, all the way RESTstyle APIs for accessing the data, generative grammars for producing natural language summaries and visually rich HTML5 presentations. 

Data Scientist – Gaming Analytics, King.com (also available in Malmö and London)

  • Experience of dealing with large, real-world datasets (SQL)
  • In depth statistical, machine learning, Bayesian knowledge
  • Hands on experience with data mining, clustering, segmentation
  • Software engineering skills, in languages such as Java, R, Python, etc.
  • “Big data” systems experience (Hadoop, Hive, MapReduce, etc)
  • Robust statistical significance testing, A/B testing, predictive analytics
  • Advanced use of Excel spreadsheets for analytical purposes

Analyst at Spotify

We are looking for an outstanding junior analyst to guide Spotify’s business decisions by crunching numbers, analyze the results, gain insight, identify opportunities and then aid the rest of the organization in making the correct decisions.
Daily we collect vast amounts of data, which contains extremely valuable business intelligence. Everyday we analyze terabytes of data in order to get statistics about Spotify and our users, learn about artist trends, target advertisements better and much more. The goal is to improve and make the Spotify service better.

Quantitative Analyst at Klarna

As a Quantitative Analyst you will explore large data sets to identify underlying patterns, trends, and key risk drivers. It is your task to identify areas of improvement and take initiatives to solve these by making tests, analyse the results and implementing new features. You will also develop new models to track risk metrics connected to our dunning procedure, for new and existing markets. There are great opportunities for impacting the direction or your work and challenge yourself with exciting work tasks.

Generalists make good data scientists

A few quotes from an interview with Pete Skomoroch at the Metamarkets blog:

Some of the best data scientists I see often have worked in a few different domains. I think that helps with creativity and problem solving. A nice way to sum data scientists up that I’ve heard: “They’re better statisticians than your average programmer and they’re better programmers than your average statistician.”

 

[…]

 

Working in different domains is good for people who are intellectually curious and just like solving problems in general. It can be a challenge, but it keeps life interesting. You see commonalities. If you take a really good data scientist and they’ve been working in bioinformatics, and then you drop them into a consumer internet company, they can often ramp up fairly quickly, pick up some domain knowledge and then start solving problems.

Also check out the Metamarket blog interviews with Claudia Perlich and Drew Conway.

Summer reading

Some nice reading for the summer (in case of a rainy day of course):

  • Prediction, Learning and Games (PDF link) – Nice textbook on prediction. Via @ML_hipster (worth following on Twitter if you like @bigdatahipster and/or authentic, hand-crafted decision trees)
  • Data Science 101, a very nice blog which points to a multitude of resources
  • School of Data and the accompanying Data Wrangling Handbook
  • Agile Data by Russell Jurney (who is well worth following on Twitter and especially Quora). This book isn’t finished yet but can be viewed in its current state of development at the given link, which is within the Open Feedback Publishing System at O’Reilly Media. So you can, on one hand, read the book (or parts of it) for free before publication, and on the other hand, provide feedback and thus shape the contents of the book.
  • (edit 17/7 2012) Might as well throw this one in: Data Jujitsu: The Art of Turning Data into Product by DJ Patil, a free O’Reilly Radar report (epub/PDF/mobile).

Post Navigation