Artificial Intelligence has been the center of attraction for some time now in all the major sectors. With this new surge, people are trying to implement ML/ DL algorithms to different problem statements. But in the haste of building an “AI application”, are we forgetting something?
There are two schools of thought in AI Data-driven approach and the Model-driven approach. The data-driven approach talks about improving data quality, data governance to improve the performance of a specific problem statement. On the other hand, the model-driven approach tries to build new models and new algorithmic manipulations (or improvements) to improve performance. In an ideal scenario, both should go hand in hand but in reality, we have progressed much in terms of model-driven approach in comparison to data-driven approaches. There’s still a lot to be desired in data governance, data management, data quality handling as well as the general awareness regarding that.
An example in this context can be given for the recent work done on Covid-19. While the world was reeling from the pandemic situation, a lot of AI-related work surfaced. Whether it be detecting Covid-19 from CT scan, X-ray, or other medical imaging or predicting the disease progression, or even predicting the total number of deaths. On one hand, this large amount of work across the globe has improved the knowledge base of the disease itself and in some places helped the clinical workers to work with large masses. However, at the other end, very few among the large amount of work were deemed fit for any practical implementation process like in the healthcare sector. The reason for such low practicality can be traced back to mainly data quality issues. A huge number of projects or research were done on duplicate images from other datasets. Even then, there’s a distinct lack of external validation as well as demographic information on training data. Most of these projects would fail a systematic review and fail to report on the biases. Hence, the reported performance can not be translated to real-world applications.
One important aspect of Data science that needs to be remembered is that bad data can never lead to great performance however good your model is. Awareness of systematic data collection, management, and consumption is necessary for a Data Science project to have real-world applications. Only then, society can reap the benefits of the ‘marvelous AI’.