Special Seminar in CMS
In this presentation, we investigate the effectiveness of time-dependent data in improving the quality of AI-based products and services. Time-dependency means that data loses its relevance to problems over time. This loss causes the algorithm's performance deterioration and, thereby, a decline in created business value. We model time-dependency as a shift in the probability distribution and derive several counter-intuitive results.
We, theoretically, prove that even an infinite amount of data collected over time may have limited relevance for predicting the future, and an algorithm trained on a current dataset of bounded size can attain a similar performance. Moreover, we show that an ideal growth strategy of a firm includes a shift in attention from the stock of available data as the primary value driver to the flow of data as the dataset size grows.
We complement our theoretical results with an experiment. We empirically measure the value loss in text data for the next word prediction task. The empirical measurements confirm the significance of time-dependency and value depreciation in AI-based businesses. For example, after seven years, 100MB of text data becomes as useful as 50MB of current data for the next word prediction task.