Skip to content

Latest commit

 

History

History

order-analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Order Analysis

Order Analysis

The Problem

Read in the big list of mock orders from this json file. An order is a record of a customer who placed the order for products and quantities on a given date. An order has many order lines each of which contains a quantity that was ordered and the price per item (so the total price of that order line is \(price × quantity\).)

{
    "id": "feeab806-1fd0-4986-af86-4af9e6b98068",
    "customerId": "a489bdd9-3f4e-4181-9a94-a123b9b5ea08",
    "isFulfilled": true,
    "placedOn": "2020-05-10",
    "orderLines": []
    {
        "id": "7968e6ca-9f24-419e-9d5c-8088e7354e33",
        "productName": "Bagelers",
        "quantity": 10,
        "price": 1.98,
        "productId": "395b4411-a748-49a7-8bc4-2a2debcfa75e"}
    ,
    {
        "id": "0747ad3d-b814-4200-b47e-732e87469a0a",
        "productName": "Chinese Lemon Pork",
        "quantity": 9,
        "price": 2.89,
        "productId": "03c6b0a8-bc00-496e-b91d-11f5e9152360"}}


If the below seems too easy, then do it in your second strongest language or one that you are learning.

Challenge 1

Write a function that returns a map/dictionary/object/any associative structure that has all the days as week as keys (“Sunday, Monday, etc…”) and the average total price of all orders placed on each day. If a day does not appear in the dataset (hint - there are no Saturdays), then it should still have a key with a value of 0.

Challenge 2

Which customer ordered the most items? If more than one customer ordered the most then which one of those spent the most money?

Which 10 day time period saw the most orders? The highest order total?

Challenge 3

Assuming your programming language supports this (most do), make the days of week from Challenge 1 localized to the a passed in culture (so the turkish code =tr= should localize to days in Turkish), use the current user’s culture by default.

Mock data:

I generated a bunch of mock data from from Mockaroo. To make it work for the problem I need to remove all orders on Saturdays from it. We also want some repeat customers. Mockaroo’s interface was not flexible enough to do this, so a bit of python will help

import json
from datetime import datetime
from random import sample, choice

with open("./original_mock_orders.json") as f:
    orders = json.load(f)

orders = [o for o in orders if (day := datetime.fromisoformat(o["placedOn"]).strftime("%A")) != "Saturday"]
customers = list(sample(set(o["customerId"] for o in orders), 30))

for o in orders:
    o["customerId"] = choice(customers)

with open("./mock_orders.json", "w") as f:
    json.dump(orders, f)

And now to test that I did this correctly

import json
from datetime import datetime

with open("./mock_orders.json") as f:
    all_orders = json.load(f)

return {}
    "total": len(all_orders),
    "placed on Saturday": len([o for o in all_orders if datetime.fromisoformat(o["placedOn"]).strftime("%A") == "Saturday"]),
    "unique customers": len(set([o["customerId"] for o in all_orders])),
{‘total’: 83, ‘placed on Saturday’: 0, ‘unique customers’: 28}

Perfect.

Playground

I want to do this in racket so…lets do this in racket

First, can I even run racket in this thing? I remember I had it running once and then I kinda broke things.

(* 3 4 6)
72

Ok, cool, I fixed it.

(define foo (list-ref (call-with-input-file "mock_orders.json" read-json)
                      1))
(println foo)
(println (hash-ref foo 'customerId))
'#hasheq((customerId . "a489bdd9-3f4e-4181-9a94-a123b9b5ea08") (id . "feeab806-1fd0-4986-af86-4af9e6b98068") (isFulfilled . #t) (orderLines . (#hasheq((id . "7968e6ca-9f24-419e-9d5c-8088e7354e33") (price . 1.98) (productId . "395b4411-a748-49a7-8bc4-2a2debcfa75e") (productName . "Bagelers") (quantity . 10)) #hasheq((id . "0747ad3d-b814-4200-b47e-732e87469a0a") (price . 2.89) (productId . "03c6b0a8-bc00-496e-b91d-11f5e9152360") (productName . "Chinese Lemon Pork") (quantity . 9)))) (placedOn . "2020-05-10"))
"a489bdd9-3f4e-4181-9a94-a123b9b5ea08"

Alright so it looks like reading the json file returns a list of Racket hashtables. With the above I know

  • how to read json
  • what the resulting structure is
  • how to access a keyed value

    What about how to map over a collection?

    (define orders (call-with-input-file "mock_orders.json" read-json))
    (define customer-ids (map (lambda (o) (hash-ref o 'customerId) ) orders))
    (remove-duplicates customer-ids)
        
    '("6c5aca11-a54b-4a64-9233-ebfda91ea1e6" "a489bdd9-3f4e-4181-9a94-a123b9b5ea08" "b80b0360-28fd-4677-9a9d-a8ce7a86a56e" "2f32fb34-a687-4f52-aa29-be4acb338660" "2491fb3d-6ab9-4192-9538-50477820d716" "977f6c1e-668b-4fa7-aeff-ea91f5242636" "59e85288-7b5c-413b-a340-1f4f1d8468ff" "26626efd-510b-4269-ba3f-37ed4e42a52a" "87799be6-e4c2-48b2-993d-c7e3367920e4" "260bbce1-298a-49cf-9a2e-53c6680dfc47" "3d5e4eea-ff32-4ca1-8bb9-5d02186cea54" "1a22fd10-7e4a-401e-a05e-bb13a8e55e57" "c12b6025-663d-49ed-b099-e05562fa4473" "63c897fe-3f57-4d34-91a1-57ad1798b8c7" "47c70cc7-e8b7-4d5a-84c1-1da97e2e0a7c" "e18c9c9f-0a11-4755-b234-41de4ea4ec37" "d0e31ec2-626a-455e-9e43-44bd98b4785f" "df9b7ed0-e69d-46ed-9c4d-e271e7687399" "8445e094-72ec-48b1-b3d3-b955fe3c70e2" "53131378-35af-411f-bf95-a43170138f13" "74e4a051-7d72-4fd0-abb8-1a6b85fdfadd" "3835e5df-cd95-48b5-a530-71f48cf8d775" "62cef68d-25ec-4b61-b5ec-d7c1aafc87d6" "6139a129-fb1c-4b1a-9446-0c2b06622b76" "c66cbbb2-4136-4e60-ab2f-1843f33b70dc" "32039908-c2d9-4f88-ac42-0342063702ce" "ec176281-84b5-4f2c-bc83-4378473b0b75" "54939d94-8613-49e1-9198-1104ffcd1686")
        

    Nice! We also know how to uniqueify.

Ok, grouping is going to be a thing here. How do we group-by?

(group-by first (cartesian-product '(4 5 6) '(a b c d)))
'(((4 a) (4 b) (4 c) (4 d)) ((5 a) (5 b) (5 c) (5 d)) ((6 a) (6 b) (6 c) (6 d)))

cool, and that returns a straightforward list of lists

What about getting days of the week? Nothing in the built in racket/date the gregor package looks promising

Packages

It occurs to me that I don’t know the first thing about racket packages. Let’s start by learning about modules.

A collections is a hierarchical grouping of installed library modules. Basically this-part/of-the-require-statement. They tend to correspond to filesystem directories (though sounds like that’s not 100%). Here is where the racket collection resides for example

(require setup/dirs)

(build-path (find-collects-dir) "racket")
#<path:/usr/local/Cellar/minimal-racket/7.7/share/racket/collects/racket>
ls /usr/local/Cellar/minimal-racket/7.7/share/racket/collects/racket

Package management

Ok, in eshell (because it took a while I ran racko pkg install --auto gregor)

(displayln (~t (today) "EEEE"))
Saturday

Huh, and look right in the docs there’s an example of how to control culture

(parameterize ([current-locale "tr"])
  (~t (today) "EEEE"))
"Cumartesi"

One thing that could be helpful here is to use the threading macro

(~> 1 add1 sqrt)
1.4142135623730951

Hold on, in the docs it gives the following more complicated example where we can achieve this

(- (bytes-ref
    (string->bytes/utf-8 (symbol->string 'abc))
    1)
   2)
96

by doing this instead

(~> 'abc
    symbol->string
    string->bytes/utf-8
    (bytes-ref 1)
    (- 2))
96

Note that in the case of both bytes-ref and -, not only are functions curried automatically but they’re curried right to left where the threaded parameter will be in the first argument position! If racket does that by default, it seems likely that this is what is usually useful in racket

Ok, what about pattern matching?

(define (try-match a)
  (match a
    [0 'zero]
    [a (cons 'val a)]))
(list (try-match 0) (try-match 123))
'(zero (val . 123))

Ok nice! that’s the simple case but it works well

What about subtracting to dates to find out the amount of days that have passed?

(require gregor)
(require gregor/period)
(define d1 (iso8601->date "2020-06-12"))
(define d2 (iso8601->date "2020-05-21"))
(period-ref (date-period-between d1 d2 '(days)) 'days)
-22

On a side note, I am really impressed with racket documentation and language design. I barely know what I’m doing here and am able to find everything that I need.

Implementation

First we read in the file of course

(define (load-orders) (call-with-input-file "mock_orders.json" read-json))

So as I see it, to fulfill Challenge 3 I need to be able to generate the full list of week day names automatically. Additionally this would be needed to fulfill the requirement of Challenge 1 that days absent from the dataset (Saturdays) be listed.

Challenge 1

The plan here will be to just start on a known day, and iterate through the following 7 days formatting the day name out of each

(define first-day-of-a-week (date 2020 5 31))
(define (nth-date-of-the-week n) (+days first-day-of-a-week n))
(define (date-day-name d) (~t d "EEEE"))
(define (days-of-the-week) (map (compose date-day-name nth-date-of-the-week) (range 0 7)))
<<days-of-the-week>>
(days-of-the-week)
'("Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday")

Sweet deal

Now lets create a hash with all orders grouped by day

(define order-placed-on (lambda~> (hash-ref 'placedOn)))

(define order-day-name (lambda~>
                        (order-placed-on)
                        iso8601->date
                        date-day-name))
(define (first-order-day-name orders) (order-day-name (first orders)))
(define (cons-day-name-orders orders) (cons (first-order-day-name orders) orders))
(define group-by-day (lambda~>>
                      (group-by order-placed-on)
                      (map cons-day-name-orders)
                      make-immutable-hash))
(require threading)
(define (order-line-price ol)
  (let* ([price (hash-ref ol 'price)]
         [quantity (hash-ref ol 'quantity)])
    (* price quantity)))
(define order-total (lambda~>> (hash-ref _ 'orderLines)
                               (map order-line-price)
                               (apply +)))

(define orders-total (lambda~>>
                      (map order-total)
                      (apply +)
                      (* 1.0)))
(require gregor)
(require json)
(require racket/hash)
(require racket/match)

<<group-by-day>>
<<days-of-the-week>>
<<load-orders>>
<<orders-total>>

(define cons-with-empty-list (curryr cons (list)))
(define (orders-average-total orders)
  (match (length orders)
    [0 0]
    [n (/ (orders-total orders) n)]))
(define (empty-list-per-day-of-week)
  (~>> (days-of-the-week)
       (map cons-with-empty-list)
       make-immutable-hash))
(define group-orders-by-day (lambda~> group-by-day
                                      (hash-union (empty-list-per-day-of-week) #:combine append))) ;;fill in empty days
<<group-orders-by-day>>
(let* ([orders (load-orders)]
       [orders-by-day (group-orders-by-day orders)])
  (for/hash ([k (hash-keys orders-by-day)])
    (values k (orders-average-total (hash-ref orders-by-day k)))))
'#hash(("Friday" . 53.315) ("Monday" . 68.96000000000001) ("Saturday" . 0) ("Sunday" . 102.53) ("Thursday" . 14.219999999999999) ("Tuesday" . 58.42) ("Wednesday" . 41.459999999999994))

Challenge 3

Just skipping ahead here but challenge 3 becomes really simple now with dynamic parameters. Let’s see it in turkish

<<group-orders-by-day>>
(parameterize ([current-locale "tr"])
  (let* ([orders (load-orders)]
         [orders-by-day (group-orders-by-day orders)])
    (for/hash ([k (hash-keys orders-by-day)])
      (values k (orders-average-total (hash-ref orders-by-day k))))))
'#hash(("Cuma" . 53.315) ("Cumartesi" . 0) ("Pazar" . 102.53) ("Pazartesi" . 68.96000000000001) ("Perşembe" . 14.219999999999999) ("Salı" . 58.42) ("Çarşamba" . 41.459999999999994))

Nice!

Challenge 2

Customer with most items

So to find the best customer we

  • group orders by customer
  • group results by largest size
  • sort
  • take last: this will be the customers and their orders that ordered the most items (length, (list (customer, (list order))))
  • unwrap and sort by order total
  • take last: (customer, (list order))
  • get customer
<<group-orders-by-day>>
(define orders-total (lambda~>>
                      (map order-total)
                      (apply +)))
(~>> (load-orders)
     (group-by (lambda~> (hash-ref 'customerId)))
     (group-by length)
     (sort _ > #:key length)
     (first) ;;list (customerId grouped) of list of orders
     (sort _ > #:key orders-total)
     (first) ;; list of orders for a given customer
     (map (lambda~> (hash-ref 'customerId)))
     (first))
"260bbce1-298a-49cf-9a2e-53c6680dfc47"

10 day period with the most orders

So this has the need to chunk orders by their `placedOn` date within 10 of each other. Lets start by creating a function that takes two orders and determines if they’re within ten days of each other

(require gregor)
(require gregor/period)
(define placed-on-date (lambda~> (hash-ref 'placedOn)
                                 iso8601->date))
(define (date-within-days? max-days d1 d2)
  (~> (days-between d1 d2)
      abs
      (<= max-days)))
(define (orders-within-10-days? o1 o2)
  (date-within-days? 10 (placed-on-date o1) (placed-on-date o2)))

Now let’s create a generic function that takes a sequence of ordered items and moves a sliding window over them that ensures they are within some distance of each other. To test it we can take a sequence of numbers and create sliding windows of groups that are within 3 of each other

136610121214

So we process these one by one, accumulating them into a list which

  • always contains the item being iterated on
  • guarantees that all items are within - in this case - 3 of each other. Any items preventing counter to this guarantee from being kept are not included

So in this case the output would be

  • 1
  • 1 3
  • 3 6
  • 3 6 6
  • 10
  • 10 12
  • 10 12 12
  • 12 12 14

Here would be a good implementation. We use dynamic scope via make-parameter

(require threading)
(require rebellion/streaming/transducer)
(require rebellion/collection/list)

(define (prepend-to-elements-within-distance within-distance? items item)
  (~>> items
       (filter (curry within-distance? item))
       (cons item)))

(define (group-into-sliding-distance-windows within-distance? data)
  (transduce data
             (folding (curry prepend-to-elements-within-distance within-distance?) '())
             #:into into-list))
<<group-into-sliding-distance-windows>>
(define (within-3? a b) (~> a
                            (- b)
                            (abs)
                            (<= 3)))
(group-into-sliding-distance-windows within-3? (first data))
'((1) (3 1) (6 3) (6 6 3) (10) (12 10) (12 12 10) (14 12 12))

We also just need a simple utility that can get both min and max from a list. We could do it easiest by sorting and getting the first and last. This isn’t the most efficient way to go about it because we’re then at O(n*log(n)) instead of just O(n) but meh, we don’t have all that many orders here for it to matter.

(define (minimum-maximum less-than? lst)
  (let ([sorted-lst (sort lst less-than?)])
    (values (first sorted-lst) (last sorted-lst))))

We should now be able to take all orders sorted by their placedOn date and to group them into windows

(require json)
<<group-into-sliding-distance-windows>>
<<load-orders>>
<<minimum-maximum>>
<<orders-total>>

(define placed-on (curryr hash-ref 'placedOn))

(struct date-stats (min-date max-date order-count orders-total) #:transparent)
(define (date-stats-of-order-group orders)
  (let-values ([(min max) (~>> orders
                               (map placed-on)
                               (minimum-maximum string<?))])
    (date-stats min max
                (length orders)
                (orders-total orders))))

(define (date-stats-on-orders-in-window within-distance? select-interesting-order-group-charecteristic)
  (~>> (load-orders)
       (sort _ string<? #:key placed-on)
       (group-into-sliding-distance-windows within-distance?)
       (sort _ > #:key select-interesting-order-group-charecteristic)
       (group-by select-interesting-order-group-charecteristic)
       first
       (map date-stats-of-order-group)))
<<date-stats-on-orders-in-window>>
<<orders-within-10-days?>>
(date-stats-on-orders-in-window orders-within-10-days? length)
(list (date-stats "2019-09-01" "2019-09-11" 7 373.27) (date-stats "2020-03-08" "2020-03-16" 7 228.75))

Perfect. looks like there are two groups of seven orders that fit within a span of ten days with the dates above.

10 day period with the highest order total

We can use the above function we made to do this easily

<<date-stats-on-orders-in-window>>
<<orders-within-10-days?>>
(date-stats-on-orders-in-window orders-within-10-days? orders-total)
(list (date-stats "2019-07-30" "2019-08-05" 5 387.82))

And there we go. Between 2019-07-30 and 2019-08-05 there were 5 orders totalling $387.82.