Predicting Success of a Reward Program at Starbucks | by Erdem Isbilen | Jun, 2023

Udacity has offered us three datasets in JSON format: portfolio, profile, and transcript. Every dataset serves a unique function and supplies worthwhile data for our evaluation.

Portfolio dataset

This dataset supplies details about the lively provides accessible at Starbucks.

  • id (string) — provide id
  • offer_type (string) — sort of provide ie BOGO, low cost, informational
  • issue (int) — minimal required spend to finish a proposal
  • reward (int) — reward given for finishing a proposal
  • period (int) — time for provide to be open, in days
  • channels (record of strings)
Portfolio dataset (authentic) — Picture by (writer) Erdem Isbilen

There are ten rows and 6 columns within the portfolio dataset. It’s a easy dataset with no lacking, null or duplicate worth.

channels’, ‘id’, ‘offer_type’ columns are categorical whereas ‘issue’, ‘period’, ‘reward’ are integer.

See beneath the modifications I’ve executed on the dataset:

  • one-hot encode the ‘channels’ and ‘offer_type’
  • change the ‘id’ with ‘offer_id’
Portfolio dataset (after information wrangling) — Picture by (writer) Erdem Isbilen

Profile dataset

The profile dataset incorporates demographic details about Starbucks prospects.

  • age (int) — age of the shopper
  • became_member_on (int) — date when buyer created an app account
  • gender (str) — gender of the shopper (observe some entries comprise ‘O’ for different slightly than M or F)
  • id (str) — buyer id
  • earnings (float) — buyer’s earnings
Profile dataset (authentic) — Picture by (writer) Erdem Isbilen

There are 17000 (variety of distinctive particular person within the dataset) rows and 5 columns on this dataset with 2175 null objects (each in gender and earnings columns). As age worth can be 118 on these rows, I eliminated all 2175 rows from the dataset.

See beneath the modifications I’ve executed on the dataset:

  • eradicating 2175 rows with lacking worth (additionally with age worth of 118)
  • change ‘id’ with ‘customer_id’
  • ‘become_member_on’ string so far conversion
  • creating ‘year_joined’ and ‘membership_days’ columns
  • on-hot encoding ‘gender’
  • creating ‘age_group’ to categorize the purchasers as teenager, young-adult, grownup, aged
  • creating ‘income_range’ to categorize the purchasers as common, above-average, excessive
  • creating ‘member_type’ to categorize the purchasers as new, common, loyal
Profile dataset (after information wrangling) — Picture by (writer) Erdem Isbilen

It’s seen that the quantity of people that joins this system has an rising development between the years of 2013 and 2017 with 2017 is the perfect yr. %50 of the members are between the age of 42 and 66.

Histogram graphs of age, earnings, year_joined columns — Picture by (writer) Erdem Isbilen

As it may be seen beneath, male inhabitants outnumbers feminine inhabitants in decrease and common earnings zone, whereas feminine inhabitants outnumbers the male inhabitants in increased earnings zone.

Histogram graphs of earnings proven individually in several gender teams— Picture by (writer) Erdem Isbilen

When contemplating the gender, the dataset is just a little bit biased, as variety of male inhabitants outnumbers the feminine inhabitants and there’s small variety of folks for the opposite class. Talking with the precise figures; there are 8484 males, 6129 females and solely 212 others within the dataset.

Histogram graphs of gender — Picture by (writer) Erdem Isbilen

Transcript dataset

The transcript dataset captures buyer interactions with the provides.

  • occasion (str) — document description (ie transaction, provide acquired, provide seen, and so on.)
  • particular person (str) — buyer id
  • time (int) — time in hours since begin of take a look at. The information begins at time t=0
  • worth — (dict of strings) — both a proposal id or transaction quantity relying on the document
Transcript dataset (authentic) — Picture by (writer) Erdem Isbilen

If an occasion within the transcript dataset corresponds to one of many three doable provide statuses (seen, acquired, or accomplished), the worth column incorporates the id of the provide. Along with provide id, there shall be reward worth in case the occasion is in ‘provide accomplished’ standing.

Nonetheless, if the occasion is transaction, the worth column will show the transaction quantity solely.

See beneath the modifications I’ve executed on the dataset:

  • Increasing the ‘worth’ into‘offer_id’, ‘quantity’, ‘rewards’ new columns.
  • Creating ‘time_in_days’ by changing time (hours) to days.
  • Altering ‘particular person’ to ‘customer_id’
  • Diving the transcript dataset into two sub-datasets as offer_tr (provide information) and transaction_tr (transactions information)
offer_tr dataset (after information wrangling) — Picture by (writer) Erdem Isbilen

PyTorch Mannequin Efficiency Evaluation and Optimization — Half 2 | by Chaim Rand | Jun, 2023

RoboCat: A self-improving robotic agent