Special Seminar in CMS

Tuesday September 7, 2021 12:00 PM

Time and the Value of Data

Speaker: Ehsan Valavi, Technology and Operations Management, Harvard Business School
Location: Online Event

In this presentation, we investigate the effectiveness of time-dependent data in improving the quality of AI-based products and services. Time-dependency means that data loses its relevance to problems over time. This loss causes the algorithm's performance deterioration and, thereby, a decline in created business value. We model time-dependency as a shift in the probability distribution and derive several counter-intuitive results.

We, theoretically, prove that even an infinite amount of data collected over time may have limited relevance for predicting the future, and an algorithm trained on a current dataset of bounded size can attain a similar performance. Moreover, we show that an ideal growth strategy of a firm includes a shift in attention from the stock of available data as the primary value driver to the flow of data as the dataset size grows.

We complement our theoretical results with an experiment. We empirically measure the value loss in text data for the next word prediction task. The empirical measurements confirm the significance of time-dependency and value depreciation in AI-based businesses. For example, after seven years, 100MB of text data becomes as useful as 50MB of current data for the next word prediction task.

Series Special Seminars in Computing + Mathematical Sciences

Contact: Diana Bohler at 6263951768 dbohler@caltech.edu