Test measurement data produced by integrated circuits (ICs) can be very expensive and time consuming to collect. In this work, the relationship between characteristics of the failing IC and the amount of test data needed to produce an optimal diagnosis result is analyzed to optimize test data volume. Initial experiments that analyze diagnosis quality as a function of the amount of test data collected from six independently-tested blocks of an actual fabricated IC reveals that the amount of data needed, in general, can be reduced. We use various machine learning techniques. Logistic regression involving several parameters is used to predict the amount of test data needed for an optimal diagnosis result. Examples of parameters used in the regression include the number of (i) failing bits, (ii) unique failing bits, and (iii) failing patterns. On average, this technique correctly predicts the amount of test data needed 80% of the time. We anticipate this model to be used in real-time to terminate test and data collection for each individual IC.