Part 1 - The Engineering Mindset
A lot of aspiring data scientists come from non-engineering backgrounds -- this can present some challenges when breaking into the field of data science.
Don't get me wrong, it is great to bring a unique background and perspective to the world of data science, but it's not enough.
Engineering is critical part of being a data scientist.
Developing the right mindset will not only help you break into the field, but it will also make you far more effective when you get there.
Learning the engineering mindset will help you see the big picture, systematically solve underlying problems, and stay focused on results.
Here are my top tips for cultivating the engineering mindset for data scientists:
Build systems to solve problems
It's not about hacking together a one-off solution. It's about building systems to solve problems. Systems that are testable, robust, efficient, scalable, and portable. Systems that you can resuse. Systems that can accomodate new data sources and challenges in the future.
Solve the underlying problem
It's not about maximizing R^2, minimizing MSE, or improving accuracy. It's about understanding the underlying business problem and how to solve it. Performance metrics can be difficult to nail down. In order to choose the right one, you need to understand the business problem. What metrics are important to the business? Why are they important? How do they affect the business? What is the problem behind the problem that you are really trying to solve?
Start with a simple solution and see if it's good enough. If it isn't, then you can work to improve the weak points. Don't make the mistake of massively over-engineering a solution when a simpler alternative would have been just as good.
Use the agile approach: start by developing a minimally viable product and iterativerly improve it until it is enough.
Write software, not scripts
You don't need a degree in computer science, but you do need to master the basics:
- Data structures
- Testing and automation
- Software design
- Object-oriented design
As a fun way to test your knowledge and practice for interviews, work through Cracking The Coding Interview by solving all the problems in Python with object-oriented code and unit tests.
Work with limitations
The data will be dirty. You won't have enough time to build a perfect model. You won't understand every use case before your model goes into production. You will need to figure out new, creative ways to apply methodologies. It won't be perfect, but it has to work.
Don't fight the limitations, accept and embrace them so that you can learn to overcome them.
Test thoroughly and automate
Your results can't just be good in theory. They have to be proven with real data. You need to unit-test your code to prove it works. You need to integration test the entire system. This testing needs to be automated, both for the sake of time and reliability.
You need to measure the efficacy of your models against the next best alternatives. The problem you're solving doesn't exist in a vacuum, nor should your solution.
Focus on the end result
No model will ever be perfect, but the end result to be excellent. If you build a system to solve the problem, focus on tackling the underlying issues, write solid software, and accept the limitations that you have to operate within, then you will be able to produce valuable solution that can be built upon in the future.
Embrace the engineering mindset and you will see quickly set yourself apart from other candidates. You'll no longer just be a predictive modeler, you will be a data scientist.