In the fourth week of my project, I sat down with some data scientists to learn exactly what it is they do and how they aid in fixing the issues of overproduction and stocking proper quantities.
We discussed various projects, such as fixture optimization (how physical stores should be planned out, how to stock them and what clothes to place where), labor optimization (how many people to hire and when to assign them hours), and just overall how to tailor the experience to what the customers want in the most efficient way. The project that they were working on which I found most relevant to my research, however, was their propensity calculating program.
As many of us have learned in economics, a consumer’s propensity is their tendency to behave in a certain manner; in our case, we were trying to see what items consumers were most likely to buy given certain prediction (or “feature”) variables about the consumer. The program would assess the feature variables of the customer (ex. age, gender) and try to find out what kinds of departments, clothing items, brands, etc are desirable to the customer who searches/browses. It quantifiably predicts the likelihood a customer will purchase something based off those category levels, and rank them/stock them in an according manner; it will arrange the items in the order a customer likes based off their calculated propensity for each item (ex. If someone is big and tall, it promotes big and tall clothing for them).
There are a variety of different ways to go about organizing and analyzing the data to create such a program, but all of them require the same building blocks: going from feature variables to target variables and training a model algorithm so that it can weight the variables properly to assign accurate rankings. Unfortunately I can not disclose the details of this specific program or algorithm, but if there’s one important thing I learned that the scientists said I wouldn’t find learning data science in school is the problem of cleaning up your data. Often times, not all of the data that you have collected to analyze is helpful, so it is important to do exploratory data analysis to see and double check that as many variables as possible are covered in all of your data points. A simple example I was given to help understand this problem was: if a major feature variable for my program is gender, but only 20% of my data includes gender, then it is pretty important to clean out my data set.
Linear regression, one of the most common and simplest method of data analysis.
Neural networks and Random Forests (a multitude of small Decision Trees, randomly generated to reduce bias), methods used to reduce the dimensionality of data to provide accurate analyses when facing a plethora of factors.
I hope to continue shadowing these data scientists as they work over the next few weeks. Next week I aim to focus on finishing the research I was conducting regarding the operational and product strategies of major clothing retailers, and hopefully plan another day trip journey to visit other clothing companies which are innovating in the field of merging fashion with physical technologies!