MLOps Blog

Managing Computer Vision Projects with Michał Tadeusiak

Stephen Oladele

17 min

19th December, 2023

Computer Vision MLOps

This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners.

Every episode is focused on one specific ML topic, and during this one, we talked to Michal Tadeusiak about managing computer vision projects.

You can watch it on YouTube:

Or listen to it as a podcast on:

But if you prefer a written version, here it is!

You’ll learn about:

1 Steps and milestones of a computer vision project
2 Non-technical side of managing computer vision projects
3 Biggest failures and lessons learned
4 Structuring the team for CV projects
5 …and more.

Sabine Nyholm: Welcome back to another episode of MLOps Live. I’m Sabine, your host. I’m joined by my co-host, Stephen, and with us today, we have Michal Tadeusiak, who will be answering questions about managing computer vision projects.

Michal is the Director of AI at Deepsense.ai. He has two master’s degrees in Complex Systems Science from École Polytechnique and the University of Warwick. He has led several data science projects spanning multiple industries like manufacturing, retail, healthcare, insurance, safety, et cetera. Also, science projects around technologies like predictive modeling, computer vision, NLP, and several profiles like commercial proof of concepts and competitions workshops.

It would’ve been much easier for me to list things that you have not done, Michal. Welcome. Anything you’d like to add?

Michal Tadeusiak: Thanks for this introduction. Probably the reason for this is that I have been working for almost seven years right now in a company that’s doing projects for a client. We don’t control the flow of these projects, who is going to come next, and with what kind of a challenge. Therefore, the list was quite broad, I’d say.

Managing computer vision projects in one minute

Sabine: Absolutely. Michal, to warm you up for all this question-answering, how would you explain to us managing computer vision projects in one minute?

Michal: Managing computer vision projects, I’d say that there’s something that is related just to managing projects in general. Especially from my point of view, I’m busy with projects for external clients. This is a major aspect. I’d say that managing people is the most important part.

1 On the one side, this is a client that we have to manage together, the goal, it has to be understood the scope, the timeline, and the expectations. These are all very key and important aspects.
2 The other side is managing the team to develop the project. We see, let’s say, the motivation of people, the feeling of playing to one goal in a team spirit. These are probably the most important parts.

Stephen: Definitely sounds a whole like the typical project management dilemma. Thank you so much for sharing that, Michal.

Typical steps of a computer vision project

Stephen: We are focusing on computer vision right now, can you walk me through your typical steps for setting up a vanilla computer vision project from when you start discussing the business side of things and coming up with the business requirements with the stakeholders? Can you walk me through that concept?

Michal: The computer vision part it’s usually not so relevant in the first place.

1. You want to start with

understanding the goal,
understanding the business purpose of the entire corporation
what the client needs.

It doesn’t have to be at the beginning too a technical discussion. The discussions are rather to understand major pain points and then how in general, the business works there.

Then also to discuss more technical details. Then depending on the maturity of the client on the other side and how technical they are, then you can go deeper or not into technicalities.

In general, the first thing is to translate this business problem into technical terms, especially machine learning terms.

2. Then, what’s usually the first thing to do after defining the goal, the scope is to see the data. What I mean is to be sure that there are enough data and labels to address to tackle the task. Especially in computer vision projects, this is something that I’d say is quite easy compared to different ML projects.

In computer vision, once you have access to even a sample of images, it’s already quite clear how difficult the problem might be. When we speak about like NLP problems or classical ML problems with tabular data when the data can be spread in huge databases. You have to do a lot of cleaning up, merging, et cetera. This is a much harder thing. For computer vision, this is quite an easier thing to do.

3. Once we have it, then the general life cycle of developing the project goes on. Where still, one of the most important aspects is that you have to work closely with the client having them in the loop and having them in the decision processes. Also, in the evaluation of the solution, like why it evolves and why it changes.

Stephen: If you can, of course, I know you do a lot of things at Deepsense. Can you walk us through an example of a computer vision project where you talked to the stakeholders, you have to agree on a lot of things with the clients or not, and actually deliver workable software?

Michal: I could give you one of the examples where it was pretty, I’d say a serious project we’ve done for a manufacturer of tires. The question or the problem was the quality assurance of the manufactured tires.

Then this was when we were first asked by the client if we can come up with the models to detect different defects, assess the severity, the types, then maybe also classify them into the ones that need intervention versus the ones that can just pass and they’re fine. We thought, of course, we can, but there are a lot of requirements, a lot of things that have to be in place in order to make it.

In the end, the model is obviously like this major part the data scientists are busy with or the key part, but there are a lot of other things that have to be secured first. It turned out that it was not going to be too easy because it was a pretty preliminary concept ID on their site. They didn’t have the labels. They were yet to build the entire device to collect the data, et cetera.

We were helping them shape the end-to-end approach to the solution. In the end, we not only made the models, and then I made a solution like the ML solution. We designed and developed a labeling tool for them which was specialized for labeling the tires, which is not just like a regular image we have.

One thing is the tire has its sector zones which are relevant, but also that some defects that we were working with could be recognized in the images, like just visual aspects were key ones, like scratches or some bubbles, et cetera. There are also some that were more like deformations, quite hard to notice, like a black piece of rubber. Therefore, we also had some 3D images.

The entire solution was to combine the information from 2D and 3D altogether. Once we’ve done the labeling tool for them and they’ve done the labeling process that we helped them with, then there was the modeling part, which also took some time. It was the solution to quite disturbing their regular workflow with quality assurance, and there were many aspects that were important here. What I mean is what to do with this, what to do with the outputs of the model, like how to even build the further parts of the entire pipeline, the facilities, what to do, et cetera.

Once we knew what kind of defects to expect and what kind would be returning to them, then they could design the further steps. It was end-to-end quite much broader than initially thought.

Stephen: What I can get from here is that it goes beyond the technical aspect. The modeling is just a small part of the deployment. There has to be that business conversation and all of those things. Thank you so much for sharing your knowledge.

Stephen: Say you get hired to clean up a computer vision model, a computer vision project that has been, maybe the team has worked on it before, and it’s a bit messed up right now. You have like 90 days, what do you do first? You get into a team, they started working on this particular CV project, and then it starts somehow. They hire you, of course, to come and clean up the project. What do you do in the first 90 days?

Michal: Ninety days is not too little. When we work with the projects, quite often, we like to split them into 3 months periods, which is roughly 90 days. As you have mentioned, this existing legacy project.

I probably would first try to understand where the issue is. If the maintenance issue or the performance issue, or some issue with the data stream, that’s needed. You would address it in a completely different way, depending on what’s the problem.

1 If the performance is the issue, then the first thing would be to look at the results, look at what the network produces or what the solution produces.
2 Then try to trace it back to looking at the architecture and how it’s built to see where the problem might be. If the problem is with the maintenance because you see that it is like a Spaghetti code, and it’s very hard to even introduce any changes.
3 Then it might be the case that in 90 days, we were able to build a new solution, a cleaner one, and try to revive the existing one.

Ninety days is quite a lot of time for computer vision projects that I am used to.

Non-technical side of managing computer vision projects

Stephen: Just moving away from the technical side a little bit now. I assume that some internal knowledge or skills that may be an organization should have or any team should have before thinking about setting up computer vision projects. What are those? How should I think about it? I will spend quite a reasonable amount of time on the non-technical side. What are those things I should think about when I’m thinking about the metrics, the goals, and everything when managing CV projects?

Michal: Again, I will be talking about it from the perspective of a consultancy company when we want to engage with the clients as broadly as we can also, in this business part, we are not only on the technical side. I’d say that what is necessary there on the client-side very much depends on the maturity of the clients. Maybe, I would start from the easier ones.

1 The easier ones are the ones that already have some data science teams in place, and it’s their partner about their thing to work with ML or computer vision every day. They just don’t have enough capacity to solve all the challenges. Then we are there to help.
2 The more interesting ones are the ones that don’t have the data science teams, or sometimes they don’t even have software developers in the way that they are companies that live in the 21st century. There’s IT, obviously, there are a lot of systems in place, et cetera, but they may not even have software developers that would then take over or maybe that they could re-qualify or change their account skills to become some junior data scientist, let’s say. These are the most interesting ones. We like to work with such clients because then there is a lot of knowledge from their work and from our work.

Then what is needed in such cases is definitely this awareness that by being open, we may not be able to specify how good something will work in the first place. Data science is still very close, it’s somewhere between engineering and research, and then it’s often quite hard to tell in the first place the very precise performance that will be obtained. Being open to solving the problem together is the key part.

This usually is not about technical people in the company, but rather in with the stakeholders, and then I don’t know, business owners that they will be committed to solving the problem together, being open to this transfer of the knowledge information in both ways.

Business side vs technical side of the project

Stephen: Have you had situations when the business people do not agree with the technical people? Have you had situations like that? Of course, you mentioned that there has to be a transfer of knowledge. What if that bridge is a common occurrence where the business stakeholders do not agree with technical stakeholders? Have you encountered such situations, and how would you advise that technical teams navigate that?

Michal: This wasn’t usually the case with a technical approach, and this wouldn’t be accepted. There were some cases actually where the business owners seemed to be also quite technical, and then they would have their own ideas on how to solve things.

Usually, you don’t expect the business owner to impact too much on your plan for the realization and development of the solution.

Of course, what you expect is rather to guide you through the priorities, like being able to answer specifics about the problem but not necessarily proposing which architecture or which model to use.

Therefore, we had some issues with it. Then, in the end, what can I say? In the end, it was just more time-consuming for both parts to get aligned, and it involved actually working on those ideas that the client had. Plus, pushing with our ideas whenever the other ones were not really sufficient. That was tricky.

I’m not sure if I can give you a better answer to this, just to go into this dialogue with the client and then prepare them that if we want to try also the things that you’re proposing, then we’ll definitely spend more time on it. Then if you’re fine with this, then let’s do this.

Go-to architecture for computer vision projects

Sabine: Totally makes sense that there are different needs depending on the task. The next question here is about whether you have some go-to architecture over the years that has proven to be the most robust for computer vision projects. Do you have such a thing?

Michal: I will be frank that we probably used to have when I was closer to the technical aspects when I was rather in technical leader roles than I was much closer to shaping the solution. Back then, when we were to make a choice, then depending on certain requirements, what I mean is if we had some computational limitations, then there was a major thing to take into account.

Then we would rather stick to say YOLO for the one that doesn’t use so many resources or Mask R-CNN or Faster R-CNN, which was usually much more accurate, but, obviously, heavier to use. It was some time ago, but they’re still pretty popular. There are new generations of YOLO that are still being updated. Then Faster R-CNN or Mask R-CNN, they’re still used as a framework but with different backbones. They’re still there, but sometimes, neither of them was working well or well enough for us.

It’s also good to experiment with different architectures. Don’t be shy about trying different architectures if the problem at hand is not so, let’s say, common. Also, to give you some examples. One of the projects, when we had to develop our own architecture, it was also some time ago, like a bit more than four years ago.

We dealt with the images of schematics of the chemical facility where there were a lot of pumps, valves, the pipe between them, et cetera. This schematic would usually look like – you would have a sheet with a few 100s, say 200 different symbols. Some of them are very similar to each other because you could have like five different types of valve, but as a symbol, they wouldn’t differ too much. Then symbols from some measurement device, et cetera. There was not really a good network for such things. Mask R-CNN or Faster R-CNN, in this case, would be just like would work with the images where you have several objects but rather large, not so small.

We had to come up with a dedicated architecture, which I’d say, if we’re to be compared with some existing ones, would be some kind of a future pyramid network, plus fully conventional, I don’t want to get too much into details, but yes, sometimes that’s the only way to go is to experiment and be creative.

Sabine: I’m sure that’s pretty impressive.

Michal: Although, I’d rather say that it’s worth starting with some existing solutions just for the baseline, definitely faster and more reliable.

Milestones of a computer vision project

Sabine: We want to know a bit more about your project process. How many milestones would you typically have in such a process?

Michal: Milestones are one part of this entire picture. What I mean is when discussing how to deal with computer vision projects, in general, machine learning projects. I personally like a lot the crisp DM workflow or methodology, which is one of the most popular in the area. It was one of the first ones, and then some of the things that are not fully captured by this methodology. There’s no way golden grails, a lot of them, different ones which currently there would be already several different methodologies. In the end, they wouldn’t differ too much from each other.

What I mean is that the core thing in the machine learning project is the interactive process of solving the problem and coming back to the milestones.

1. One of the first milestones that I’d say I would have there is if we don’t have the data before starting the project to have the data in place, the data set in place. This is one of the core things to have. We like to have the data set in place before starting. Sometimes it’s not possible. Sometimes the data is being collected on the way, or only once we talked more with the client we understood better. They understand what’s needed, and then they would give us more of the data needed.

The first milestone would be to have the, let’s say, operational data set, something you could start with. Usually, this should be pretty early on in the project.

2. Then the further milestone is something that I would call a skeleton of the solution. When managing the ML projects also computer vision once, what I think is the most important is to have a minimal but working end-to-end solution in the first place.

What I mean is something that would use the data, even if it’s a small data set or just part of the data set you’re going to have. You will be able to load it to do some minimalistic preprocessing or anything that’s needed just to be able to run the models and train the first models. You don’t even have to have the models in the first place. You may use some heuristics or some constant models. Something that you don’t even need to train for this first end-to-end solution.

3. Then the validation procedure and computation of the metrics, sort of evaluation step.

4. Once you have it, then probably some of the deliverables. You probably would need to return what you found there in some format, some JSON, or some visualization, so also this piece.

Once you have this end-to-end solution in place, even if it’s really basic, you would already have something to build upon. This serves multiple purposes.

One is that now the problem can be split into pieces, and people can work in parallel on different aspects.
- One person can work on the data, put more data,
- the other could start with more elaborate validation procedures or the model.

It gets even more, let’s say, parallel the further you are in the development.

The other thing is that you start the main monitor the results, and you’ll have the first benchmarks to compare with. It’s very hard to say, let’s say, you get at some point 90% performance or something, F1, let’s say. You can’t even say if it’s good or bad if you don’t know what would be some baseline score of random guessing or some heuristic.

Once you have them, only then can you assess the performance and then do the milestones, the second milestone would be to have the skeleton.

5. Then, the remaining milestones, that is, for a project-specific. Depending on if it’s closer to the users, you would expect some user feedback at some point.

6. Another milestone would be to have an MVP that could be shown to some users to play with. If it’s more about being part of some web application, where would you have it in the way that you can build the backend over on this and then serve through some API to this front end? Then it depends.

For sure, those first two are the major ones, most important.

Sabine: That was certainly some good milestone in detail, right there. So Stephen, back to you if we have you on the line.

Biggest failures in computer vision projects

Stephen: I’m wondering, obviously, when Sabine read your profile earlier on, and the numerous works you’ve done as well are quite inspiring. I wanted to know, do you have what we call the biggest failures you’ve had over the long-term working on computer vision-related projects that you’d love to share? Maybe, it’s things you should know about, and maybe you’ve even overcome them.

Michal: The failures, for sure, there were many failures. One of them is when we didn’t even start the project, the failure was in the first place that we were not able to come to the same terms with the client.

1. The client was very, let’s say, they were very far from the IT area. They were manufacturers of some plastic buckets, I’m not even sure what they wanted to have. It was an AI-based monitoring solution in place to observe if people are not entering the areas where they shouldn’t go because there are some heavy-duty machines working or some other cases. Like if the specialise trash bins for something like plastic stuff or some metal remaining if they’re getting full. Then, they could schedule to empty them. So like different aspects.

In general, those things wouldn’t be too hard to do from the ML perspective. They were treating us the same way they would treat a company that would just reorganize their workplace, like put some walls, extra walls there, or some installer of some extra lighting. It’s like they were requesting very stiff timelines with no way for buffers. They were very strict with us defining the performance, the requirements, et cetera. We discussed with them for a few weeks and even went there to this place.

In the end, you just couldn’t start it. The difference is between the language we use and also the expectations were just too big to even start. That’s one of the failures. We spent quite a lot of time invested in the project, but in the end, we didn’t even start.

2. I have also another one, it’s a failure, but it’s also a success to some extent. What I mean is they’re a very complicated project, we’ve done for 10 months or something, which in the end just turned out to be not so successful. What I mean is it was a very big team working on the project. It was a project involving ML aspects but also a lot of different things like front and UX, UI, and building the entire solution backend.

In the end, it turns out that this, but I believe, nowadays, technology is not able to solve this particular task. We were always almost there but had never reached it. Then, it was a failure in the end, we didn’t really solve it, and a lot of money went there, so, quite a failure.

Although, the good things are that we still working with this client. They figured out that it wasn’t possible really to solve it with the existing technology. They still believed in us and that we made whatever we could to do this. We still cooperate together for a few years already. In the end, success also.

Stephen: We definitely love war stories in this podcast. These are lessons from the trenches, which we reckon that teams can definitely take as take-home lessons.

Managing computer vision projects vs managing other ML projects

Stephen: I think you mentioned something about computer vision projects being easier to manage earlier on. I would love to know, in your opinion, what’s the difference between managing a computer vision project as well as, say, any other ML project, NLP, or all those other projects? What are those differences you’d love to share?

Michal: Each one of those computer vision, NLP, and, let’s say, some tabular database projects. They would be different, and they would have their own difficulty. 1. Starting with the tabular ones, when we speak about rather classical machine learning, also part of the task is to build the features. The tricky part here is usually that it’s rarely the case that you have one place where the data is stored, and it’s in good shape already.

What I mean is, usually, especially when you’re dealing with not so much mature and technical clients, which are just getting in the AI, they would like to see the AI be of help. They might not be mature enough to even have one data lake or one source of the data. The difficulty is to be able to get access to multiple sources of data, combine them together, learn where all this data that might be useful is, and how to combine it. This usually is very hard to assess how much time this will take, and this is one of the difficulties. You can’t also assess how much information there is in the data. Unless this is a very simpler problem, when you can just use one table and everything is there, then this is the tricky part of those projects.

2. With NLP, the problem would usually be the data. What I mean is, when comparing to computer vision, you would have quite often some open-source datasets that are at least similar to what you do. What I mean is that you may want to do defect detection on the tires, and there might not be this particular data set there, but there might be some defect detection on a C-sheet or some carbon fiber sheet. Something that, in the end, will be quite similar. You can reuse quite a lot of existing neural nets or portray on some existing data. Therefore, the need for data is much smaller, you don’t have to have too much to already produce something useful for the client.

With NLP, that’s also the case that most of the time, transfer learning is used. Then some generic-purpose models are used to answer the specific needs of the client. What I mean is you won’t find so useful very similar NLP datasets in the open-source; this is one thing.

The other thing is, if you want to produce those datasets, this is not so easy to generate automatically. With OCR problems, there are plenty of artificially generated images for OCR training purposes. With NLP, that’s not so easy. You can use some existing generative models like GPT, et cetera. In the end, what you do is you train the model to not deal with the real data, heuristic data, but reverse engineer the GPT, or what you can get is something that, in the end, won’t work well on the clients, on the real-life data. The data is usually the problem in the other ones.

Managing compute costs in computer vision

Stephen: Let’s dig deeper into the data aspect a little bit. I would love to know, have you encountered a situation whereby, because, with computer vision projects, I think one of the things is that it could get compute-intensive. Maybe you’re dealing with higher resolution datasets, and then you have to use a distributed architecture, soft process the entire thing, or maybe the model you’re training as well.

Have you encountered a situation whereby you’re managing this project, and the compute costs get out of budget? It’s not something you budgeted for. Have you encountered that sort of situation and navigated that path with your team?

Michal: I’d say that this is something that you could assess already in the early stages of the project. Now, when discussing with a client, there is the computational power needed to deal with the scale that’s in place. We usually would say, if it’s a relatively small project, like tens of thousands of images to be processed, we have our own server farm, so we can use it, or we can just do it in the cloud, but just having a few VMs to do that job.

If you see that there will be a need for very computationally heavy projects, usually, we’re already in the deal as I secure it. Usually, we prefer that it’s done on the client’s side, in the client’s cloud, or on their accounts. We don’t have to re-invoice or deal with it. It’s already there on the client side. That’s how we usually would try to solve it.

I just recall that we had some projects where the computational budget was, especially if you work in the cloud, and this is not particularly maybe computer vision case. Let’s say if you work with a future engineer on some use some BigQuery’s or infrastructure, it can get very costly if it’s not well-designed. We had certain situations when it was pretty costly.

In computer vision, it’s more controllable because we don’t have these complex interactions of different computationally heavy operations. Once you know which model you use and times the amount of data, you are already able to assess the cost pretty well.

Lessons learned from managing computer vision projects

Stephen: We’re going into the final wrapping up. This is the time for lessons, the lessons you’ve learned, and so forth. Do you have maybe any lessons you’ve learned from projects over time that you think very small, visible scale teams can take on and incorporate through their computer vision projects, managing the end-to-end process for it?

Michal: If I were to look back into my early days of how I was approaching the projects, there was this eagerness. Then this need to have the solution or the results fast, hack some solution, use the network that you love the most or something fancy in the very first place, and try to bring some results very quickly. This is something that we would definitely avoid now, been there or let many projects.

The issue you would have if you just approach this, maybe you’ll have some very nice results after the first week. Then you have a solution that is not, you have to rebuild anyway, you can’t build upon it. You just have to put it in the trash and start seriously.

The other thing is when working with a client, then you have some very good results after the first week. Then the feeling that the project stalled because it’s quite easy to get some initially good results in the first place, then once you get into details, when you try to improve them, then it’s got tricky. Then usually, there’s this feeling that it’s stagnated.

Definitely, I would rather prefer to build the most possibly basic solution in the first place, end-to-end, then iteratively improve it. Also, observing this interactive process of improvement in a scene that it’s not so easy and straightforward.

This is something that you have time for thought process necessary for the data scientist to understand the problem better and also build some stable solution. Also, for the client to have… You see that it’s not such AI easy thing to do.

It’s a process. It’s not just you will take this AI break, put it there, and it will magically work. There’s no magic. It’s just hard work, time, and engineering.

Check also

Building and Deploying CV Models: Lessons Learned From Computer Vision Engineer

How to structure a team for computer vision projects

Stephen: It’s going to be good to end on the note of people management, and that’s a final note. Do you have a special way you structure the particular teams that work on these projects? Maybe, for example, you have a research-based computer vision engineer who’s working on the model development stuff. Do you need to have a separate say, officer engineer doing the deployment side, or how do you typically structure a team for a computer vision project?

Michal: Again, it depends on what, in the end, is to be delivered. If it’s just a POC, then just a data scientist would probably be enough.

Although, it’s worth having in mind that there should be team players. But it also the goal is to deliver certain solutions deployed, productize it. What we do, quite often or usually, and what I like a lot, is to have an interdisciplinary team pretty much as soon as it’s possible.

What I mean is when data scientists are working hand in hand with software engineers or MLOps engineers, that would then take over or wrap up the solution. Usually, the case is that it’s not that you have a certain point in time when the MLOps part of work is done and then can be productized.

Usually, it’s a smooth thing when already the improvements are small. It can be already deployed, but there’s still a work-in-progress phase. Then when these people are working hand in hand, they learn what the challenges and the requirements on both ends of these projects are. This is very beneficial.

People like it because they’re also exposed to some challenges or problems they’re not so much accustomed to. This is something that we like to do.

Sabine: If we can squeeze in a quick audience question before we wrap up. Gabriel would like to know any thoughts on computer vision data management. It seems more complex than regular tabular data. Any closing thoughts on that?

Michal: As I explained at some point to me, I wouldn’t say it’s much more complex. It’s just a different way, different approach, probably even easier in the way that, as people, we can assess how the models work. It’s much easier to debug, I’d say, it’s much, much easier to see where the problems lie and then how to address them. I would say they’re even easier. Obviously, different technologies are using what, for most of the time, deep learning, so different skills. Otherwise, at least for me, it’s easier to manage those.

Michal: I didn’t get the question. It was data management. Sorry, not the project management. Now, I see it.

Probably, what’s more, tricky is the size, so that it’s just there much heavier. If you can do some deduplication in the first place or sampling, then definitely something that is worth doing. To be honest, we usually have the problem with too small datasets, not too large.

But, in the ones where there is a lot of data, usually, you don’t need to process all of them. This is more about picking, for some active learning or for knowing where the data comes from and knowing the metadata to focus on the data that are the most relevant to start with.