Unicorns, Data Scientists, and other mythical creatures

Hi all!  It has been a while since I’ve written a blog and since my last post in January, lots of exciting things have happened to me.   Those who have been following me on LinkedIn or listening to the AB Testing Podcast know that I have taken a new job as the data scientist manager in the Azure team.  In the short time since I have started in March, I can easily say this is absolutely the best job I have ever had.   In no small sense, it really feels like a job I was meant to do from the start of my career.   It’s funny to me.   As I sit and type this, I am reminded of my mentor in High School, my calculus teacher.   He was a cranky man, who in one part loved his job, but in another felt like he had accepted second best in life.   He frequently mentioned that one day when *his* high school math teacher died, he would go piss on that grave.   He felt that his mentors had not set him up to succeed in life and was driven to not repeat that error with his students.   A story for another day, but soon after college, I would learn that same sense of responsibility to teach those what I wasn’t taught.   My mentor wanted me to be an Actuary.   I loved math (still do) and went to college with that in mind.   I bolted on a Computer Science degree once I learned how much I loved it, but upon graduating, my Math degree would encapsulate knowledge that, in general, I would not use for 20 years.   Until about 4 years ago.     Needless to say, I really love my job, what I am doing, and problems I am solving.   I do wish, now and again, that I could wind back the clock and be where I am today, but have another 20 years to master my new direction.  C’est la vie.  So much cool stuff to learn and use and so little time.

Very similar to my last team, my new team did not have much experience in their new space when I started and I am grateful to be on the ground floor of what we are building there.  Much like in many places in my company, many of the folks are old SDETs and dealing with the change is an ongoing challenge, but not one that I am unfamiliar with.   Honestly, it is going nicely in my humble opinion, but as more and more people are learning what data can do for a business, the pressure to hire and train more data scientists is ever increasing.     In the last 9 months, spontaneous 1:1s with folks have increased by an order of magnitude with folks who are: 1) looking to hire a “data scientist”, 2) looking to become one, or 3) looking to preserve their current position.    Today’s post is mostly about issue #1.  Although, #2 is also interesting to me as the majority of these folks have been Program Managers lately (which might indicate a sign of change in that discipline).   #3?   I would say that’s the majority of what I speak to on my blog and on the podcast, but if there are specific questions, send me a tweet.   We will feature it as part of our Mailbag segment on the podcast.  We love the mailbag!

This post was inspired by a talk I attended at this year’s Strata conference in New York.  The presenter, Katie Kent, did a talk on Unicorn hunting in the data science world, which I thought was fantastic.  Her company Galvanize.com offers a 12 week immersive course that claims to prep you for a Data Science role with a 94% success rate.   I haven’t researched this myself, but maybe I will test it with a few employees of mine and report back.   Might be another alternative for the #2 issues I mentioned above.  I can say Katie’s talk was great and it resonated.   Many of the discussions I have had recently was *exactly* in this problem space.   Managers coming to me or my manager wondering how to quickly learn from us being vanguards and asking how to take advantage of Data Science and “maybe if I can get an open position, you can help me hire one?”.


Yup, that’s right.   One!


Katie’s talk was about Unicorn Hunting.   The elusive Unicorn was the perfect singular Data Scientist a company could hire that could solve all of its needs in the data science space.    I regret to inform all of you except Katie.   They are extremely rare (perhaps, rarer than the Unicorn) and if you can find one, you will probably not be able to afford him/her.  (note: if you can though, you should!).   The challenge here is this perfect Data Scientist would have to be an expert in too many very distinct fields.  The ones I am aware of that exist are indeed Rockstars (in the Stats world), but these aren’t the folks you will successfully hire into your one-off position.

One new experience for me on my new team has been to hire Data Scientists.   Almost all of the folks I have interviewed have been PhD’s, but the best have only achieved Master’s degrees.   To date, I have not met a candidate with only a Bachelor’s degree.  <aside> This, by itself, is interesting.   If one can become a data scientist in only 12 weeks, why not 4 years?    </aside>   Master’s candidates are great, I think, because they have 1) learned some depth and 2) have stayed grounded in the application of their craft.   I’ve learned that there’s a big debate in academia with respect to post-doctoral individuals and whether or not to join Industry.     They are pushed to push the boundaries of the science and I think somewhere along the way, the drive to apply it to real life dissipates.  Masters students are more applied scientists, in my humble opinion, than theorists and as a result, more immediately useful to a business.  This is an exaggeration, of course, but highlights another cause for the Data Scientist shortage.  The PhD folks that do venture into industry and survive it are helpful.     They are able to pull the practiced, but old, learnings of Data Science closer to what’s currently known which causes an acceleration of all involved.

However, this is knowledge work and there’s too much of it.   Due to the cognitive limitations of any single human, it will *always* be rare for just 1 person to be even good enough for what is needed end-to-end.

Depending on who you talk to, there are multiple definitions of a Data Scientist’s job.  My present favorite: A Data Scientist helps a business drive action by understanding and exploiting relationships present in the data.   There are 4 key principles buried in that definition.

4 Key Data Science Principles:

  1. Actionability – the recommendations must be interesting, valuable, and within the means of the business
  2. Credibility – In this business, Objective Truth is *everything*.   It is ok to communicate confidence intervals, but it is not ok to be wrong. Data Science teams get, AT MOST, one time to present wrong data, insights, recommendations to an executive. It is wise to remember this.
  3. Understanding Relationships – this is the bread and butter work most data scientists are hired to do.   There are a vast sea of techniques to use for digging knowledge out of data. One must also have the ability to understand what it means.   Domain Knowledge is critical.
  4. Data – lots and lots and lots of it

To be able to succeed, one must turn data into knowledge and make knowledge work.   Sounds good as a t-shirt motto, but in practice, this is hard.   It takes deep knowledge from several disciplines to turn this into something efficient that scales to not only the amount of data being processed, but also the timeline the business requires in order to benefit from the discoveries.


In my experience, it takes a team to pull this off.


In my observations, here’s what you need:

  1. Data scientists – starting with the obvious.   However, what may not be obvious is that Data Science is a very wide umbrella.   But there are 2 major branches, and likely, you will need both.
    1. Applied Statistics – The ability to prove/disprove known hypotheses in a deductive manner.   You have a belief already in hand and you are trying to prove/disprove it. Applied Stats techniques tend to be faster than Machine Learning.   A few simple histograms are easy to pull together as an (exaggerated) example.
    2. Data Mining – Using Machine Learning techniques in an inductive manner. You start without a preformed belief, but instead a goal (such as predict when a customer will churn) and let interesting patterns within the data unveil themselves. (note: interesting is a Data Science term and it can be measured. )   Machine learning techniques, in my experience, handles Big Data problems better.   They can scale to the size of data better.
  2. Data Engineers – Engineering the movement, storage, indexing, parallelization, cleansing, and normalization of data is a very hard problem and MOST data scientists do not know how to do this.   As Big data grows, this role, already critical, becomes even more so.   Credibility starts with the data and these guys are key to caretaking and monitoring it. They should be paying attention to not only traditional RDBMS solutions, but to technologies such as Hadoop, Splunk, Azure Data Lake, etc.   Each of these solutions come with their own pro’s and con’s and you need someone who knows what they are doing. These folks should understand the architectures end to end from data emission to visualization. There is NO silver bullet and you need a person who understands the trade offs.   Every executive wants 1) a cheap solution, 2) near real time, and 3) inclusive of all the data. Cheap, Fast, or good: Pick 2.
  3. Computer Scientist – Especially in distributed computing.   The current state of the art for Big Data is to parallelize and send your code to the machine that is storing the data to do calculations (Map Reduce, in nutshell). This greatly reduces the time spent as code is easier to move than the data, but even so many of the techniques are O(N2).   Polynomial time is too slow (or expensive even with millions of machines running in parallel).   There’s an active quest for O(N) and O(1) solutions to Data Science problems as well as clever approaches to Data Structures that helps to improve speed and storage costs.   One new item that I have not spent nearly enough time on is in the use of heuristics.   More here later.
  4. Domain Knowledge Expert – Even if you got the people with the skills above, you still need to be able to understand what the data *means* in order to move forward. Typically, the data is being emitted from telemetry from lots of product developers.   Unreasonable to expect 1 person can know everything about how the product works, but you will NOT succeed if your in-house data science team knows nothing.
  5. Business Expert – You need to be able to understand what the business goals are and how to translate your uncovered insights to support decision making.   This takes art in communication and visualization.
  6. Agile – An agile coach is needed in order to pull together these folks and get them working together towards common goals.   It does NOT make sense to over focus on *any* of the above specialties.   all roles are necessary to succeed and since this team is working to improve the business, adaptability is key.   As new knowledge is gained, the team needs to be able to shift in the new direction sustainably.   This happens A LOT!
  7. Manager – Really I wanted to put Orchestrator here, but you really need someone who is a Systems Thinker who is crafting strategies to the best effect for the business.   These folks need to work together.


You can scale these “requirements” up or down depending on the problems your team is facing, but the people on the team should be working in unison like a choreographed dance or an orchestra.   They should be one team and not individual teams and focus on vertical slices of “value” being delivered.  No one person can do all of the above.  3 to 4 might be the minimum to produce a team that is outputting something that is considered valuable.  Imho, 5-7 folks with the above skills is about right as long as you have the right depth and breadth.

Lastly, AB podcast listeners will not that I frown on specialists.   This is true here as well.   Every person on your team should be able to perform at least 2 of the functions above (ideally, 3) with at least 1 place where they can competently achieve deep results.  In addition, it’s a good idea to minimize the overall overlap of expertise on your team and rely on your Agile coach to create an environment of knowledge sharing and team cooperation.

In closing:

While if you are patient, you will be able to create a team of folks who, together, have mastery of the above and if you are also lucky, keep that team small, but don’t try for the Unicorn.   They do exist, but are too hard and too expensive.  Even if you manage to land one, if the business isn’t prepped to include them in the strategy, you will likely not get the value you hope for from them.   Knowledge work cannot be done in a silo in the fleeting windows of opportunity many are encountering today.

Thanks for reading and Happy Holidays!