{"id":2427,"date":"2015-11-22T14:05:49","date_gmt":"2015-11-22T19:05:49","guid":{"rendered":"https:\/\/digital.hbs.edu\/platform-digit\/submission\/ayasdi-a-new-way-to-think-about-data\/"},"modified":"2015-11-22T14:05:49","modified_gmt":"2015-11-22T19:05:49","slug":"ayasdi-a-new-way-to-think-about-data","status":"publish","type":"hck-submission","link":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/ayasdi-a-new-way-to-think-about-data\/","title":{"rendered":"Ayasdi: A New Way to Think About Data"},"content":{"rendered":"
Best practice in data analysis is to begin with a question, form a hypothesis, design an experiment, and perform the appropriate analysis to find correlation or causation between dependent and independent variables.\u00a0 In business, this analysis is usually used to make predictions about future behavior of customers, sales teams, etc.<\/p>\n
In reality, most businesses look to solve a problem at hand, for example fraudulent credit card transactions, by analyzing data they have and assuming certain relationships or linkages between either inputs or outcomes.\u00a0 This approach is problematic for several reasons \u2013 it is not generally randomized, and it is often overly localized analysis that can lead to overfitting or overly generalized analysis that can lead to analysis that tries to group all similar outcomes in one bucket of cause and effect.<\/p>\n
Ayasdi is an enterprise software company that uses topological analysis to help data scientists understand the shape of their data.\u00a0 The science of topology is primarily concerned with the study of shape (i.e. linear, clustered, flared, etc.). With data, the shape encodes structure and meaning. \u00a0Because business data is often large and high-dimensional, most companies bucket outcomes (eg. fraud) in a single category and apply one understanding of shape. By using shape as the first step in the analytical process, Ayasdi allows businesses to build principled local models.<\/p>\n
In the example of credit card fraud, banks may experience fraud for many reasons but will create one rules-engine to provide predictive criteria for identifying fraudulent behavior.\u00a0 However, this approach will attempt to explain both the fraud perpetrated by a guy who steals wallets and uses the cards as well as the fraud where bulk user data is stolen from a retailer and used elsewhere.\u00a0 Clearly, these cases are different and you would expect different behavior between these groups \u2013 a clear example of clustering (see image of this analysis below).<\/p>\n