How to Hire and Build a Machine Intelligence Team - Part 2 - Assessing and Closing Talent 🤝

‍

Insights from companies at the frontier of AI/ML across stages (seed to public) like Lyft, Neurable, Recurrency and how they source, assess, and create technical teams.

‍

If you’re a startup founder or early-employee interested in configuring your AI team from scratch, or think you might be one day, this is for you! As a reminder: I interviewed experts at startups in various stages (seed to post-IPO), synthesized common themes, added some of my own takes, to produce this write-up towards creating a better and more transparent end-to-end process for hiring in machine intelligence.

‍

Read Part 1 if you haven’t already: How to Hire and Build a Machine Intelligence Team - Part 1 - Finding Talent

Now onto Part 2 on Assessing and Closing Talent…

‍

Assessing Talent

Technical Chops

Technical interviews around ML differ quite a bit from software engineering tests that are based on algorithms and data-structures. After all, pulling together a well-oiled pipeline and model is more experimentation and intuition—sometimes grounded in a math foundation, while other times exhibiting strange, unexplainable behaviours.

‍

Early-stage hiring is like designing a brand new puzzle. Decide on what kind of puzzle you want to make; colour or impose images on the individual pieces; dictate how they fit together. It’s the making of a novel machine—all parts from scratch. Later stage hiring is like retrofitting an old machine, or in the best case creating a new one yet still to be influenced consciously or unconsciously by past operational biases (e.g. software teams should be run with X principles under Y structure). The problem becomes how to slot given existing pieces together, occasionally sharpening edges or filling crevices for a snugger fit.

‍

Here are a few examples:

‍

Recurrency (seed) and Neurable (pre-series B) find themselves in a delicate balancing act—hoping to evaluate candidates thoroughly without taking away too much engineering build-time.

‍

At Neurable, the team recommends an “Audition”, which is a domain-specific coding take-home assignment simulating everyday work at the company. Following this multi-hour challenge, the applicant pool is slimmed further by evaluating the results of their submissions. The team finally schedules a time for the final batch of qualified candidates to individually walk through their solution.

‍

The objective in ML take-homes is to tease out approaches to the problem and observe how complex ideas are communicated, verbally and writing. Sharp questions include:

what other model architectures would you have tried? Why did you pick the current few models?
how could you improve existing features?
where can biases be introduced in the pipeline?
why are these the appropriate metrics for evaluation?
what improvements would you make if you had more time?

‍

At Recurrency, candidates are assessed in a loop-style series of behavioural interviews with each team member. Emphasis is placed on digging deeply into the candidate’s contributions to past projects and their impact. The team then considers how well this fits into the current objectives, and asks questions structured around the product e.g. what would you change about the current features and why? The speed of execution is the difference between life-and-death for a startup. This is why product-centric evaluations are essential to understand if an individual can jump right in and guide the trajectory of the rocketship.

‍

At Lyft, Syngenta, and FinCo, the tests are extremely thorough in comparison, with some overlaps. We can bucket these tests as probing several main classes of mastery. These can be very rewarding if adopted earlier in a startup’s life.

‍

1. Math: this takes the form of solving pure stats, linear algebra, and probability questions, and/or problems using these concepts in the context of ML. Derive and explain why the loss-function of logistic regression is that way. Calculate one gradient update step with a particular learning rate. Compare and comment on two distributions. This type of question can be used to distinguish people who have first-principal understanding and the foundations necessary for crucial parts of the job outside of “model.fit()”.

2. Coding exercises (usually in Python): this is your traditional data structures questions involving dictionaries, hashmaps, or trees, which tend to be commonly used when querying, processing, and updating large datasets.

3. ML take-home customized for area of expertise: example areas include NLP, optimization, Bayesian inference. Here the candidate is asked to build a relatively straightforward end-to-end pipeline or one particular component (e.g. feature selection) well.

‍

The above assessments are elaborate for a small team to adopt as a recurrent practice in time and capital constrained conditions, but it’s certainly not impossible. In the spirit of hiring slowly and deliberately, there are absolutely ways an early-stage founder could gauge competence in most if not all the above areas by relying on some heuristics.

‍

I’d advocate for implementing what I call short-circuits (analogous to shorting-circuiting the programming paradigm) i.e. can you set up early tests to make a “no” decision swiftly.

‍

Ways of saying no quickly also means you have the luxury of spending more time with those that are more well-suited for the role, gathering more data points, before you decide on your champion.

‍

Cut rational corners in Q&A. Set up baseline roadblocks in places you’d expect the candidate to circumnavigate. Partially-automate or outsource certain technical assessments. Deliver evaluation that unlocks multiple areas of interests at once. Get creative. Front-load tests that cover the essential non-negotiable skills first.

‍

In screening candidates and deciding quickly, itnever hurts to check for unconscious (or conscious) biases that might affect hiring criteria; the unintentional cultivation of a diversity or culture problem is hard to reverse after the fact when it’s ingrained. So as much as pushing efficiencies, remember to respect the candidate's overall experience, which matters in driving your optics as a workplace and a team.

‍

”Startup” Traits”

Outside of technical skills, AI/ML founders can look for the certain soft skills and behavioural cues to qualify candidates.

‍

Throughout my interviews, the number one-rated skill is communication. As AI is an interdisciplinary field, you need to communicate to multiple stakeholders cross-functionally. Audiences that need to understand you could range from customers reading technical tutorials, teammates collaborating on a bug fix, to investors who are listening to you present a data visualization.

‍

Another skill is curiosity-driven learning: asking pointed questions, thoughtfulness and saliency in conversations, being continuously proactive and humble in seeking the opinions of domain experts or teammates.

‍

The non-obvious and nuanced ability trait that is overlooked is critical thinking; in particular, thinking beyond the interesting technical challenge, in how business outcomes will be affected and the ethical concerns of a new technology without guidelines.

‍

Awareness or general interest around the following areas can be informative indicators:

How does the cross-validation metrics translate to product performance and customer experience?
What are suspected sources of data leakage when the performance is too good to be true? Is the algorithm robust?
Is the model’s training data actually representative of people of all backgrounds?

‍

Even better if candidates ask these questions unprompted. Right away, we know that they wonder out-of-the-code frequently, bear in mind big picture business models and needs, and reflect on the consequences of model deployment.

‍

Conclusion:

1. Every member of your team connecting with the candidate needs to keep in mind that conversations should be as much about convincing/selling the candidates to join a once-in-a-life-time opportunity, as much as it’s about understanding their capabilities in ML.

2. Time spent with each candidate should serve as a negative or positive example to enrich and refine an overall talent pattern-matching engine, constituted partly by the workflow and setup of your recruiting system, and partly by your personal judgement as a manager.

‍

Hope this mini-series was helpful or enjoyable!

You can always reach me at alex@marct.ai to let me know your thoughts.

Alex