I've spent 13 years as a data scientist, and I’m not a fan of how diluted the title has become.
Let’s talk about what the role really requires.
So a widely accepted fact is that data science requires domain knowledge. That’s because cleaning, investigation, analysis, and modelling of data can’t meaningfully occur outside of its proper context. Business context, problem context, and data context all inform what approach makes sense, how the solution should be designed, and what the end results even mean.
But it’s pretty hard for a data scientist to be an expert in the business, the specific problem, and also everything upstream of the data they’re handed.
So we add communication to the list of requirements, because they need to talk with stakeholders.
But what we really mean is that data scientists should be able to
👉 Listen, ask questions, and really “get” what’s going on
👉 Turn all of that into a conceptual approach that makes sense
👉 Turn the approach into math and code that works
👉 Translate the results into stakeholder speak
👉 Tell stakeholders something useful in words they understand
There’s a lot going on there. We need to translate between 3 worlds
• Real world
• Data science concepts
• Programming
When translating between two languages, fluency in both is required. The same sort of thing applies here.
This is why so many employers insist on hiring people that have prior knowledge in a particular industry. It cuts down on employee time and effort spent on steps 1, 2 & 4. (They also use case studies interviews to test these.)
Then they test for technical skills to cut down on time spent in step 3.
This theoretically maximizes time spent on the step 5, which is the parts employers care about because it's where the $ comes from.
It sort of makes sense as a hiring approach, especially if the role is a very specialized one so the problems tend to look pretty similar most of the time. Large companies can afford to do this.
I’d argue that it makes a lot less sense if you believe (and I do) that a big part of the data scientist’s job is to be able to figure out what makes the most sense across a variety of problems that are never quite the same. If that’s the case, they’re ALWAYS going to be learning new domain knowledge, problem contexts, and conceptual frameworks. Being too accustomed to taking the same approaches can actually be a liability.
I'm not sure where I'm going with this, but TLDR;
💡 If you’re a data scientist and want to work on new problems develop and demonstrate general skill at steps 1, 2 & 4
💡 If you work with data scientists, don’t hold back business and problem and data context because you don’t want to “complicate” things or reveal too many secrets
💡 If you’re hiring data scientists, think carefully about what skills you actually need and if you can be more flexible