## Advent of Code 2022 - Day 3: Rucksack Reorganization

For Day 3: Rucksack Reorganization, we are given an ominous premise: each string in our input represents a rucksack with compartments and all the items inside. The items, however, are stored improperly. Today is the first weekend puzzle, which tend to be more time intensive than those during the week. It's only Day 3, but could we be tasked with some variation of the Knapsack Problem, needing to reorganize the items to optimally fit?

Alas, nothing so complicated. (Phew!) Instead, our goal in both parts is to find common elements shared by various strings (or parts of strings).

### Part 1

In Part 1, we need to find the only character that appears in both the first and second halves of each input string. Then we need to run that through a `priority`

function to transform it into an integer.

Splitting the list is straightforward. Let's take the length of the list and use that to determine how many characters belong in each piece:

```
let size = length s `div` 2
in (take size s, drop size s)
-- Update: I just learned about `splitAt` in Prelude:
-- splitAt :: Int -> [a] -> ([a], [a])
(s1, s2) = splitAt (length s `div` 2) s
```

I didn't care to get fancy with my priority function today:

```
charPriority :: Char -> Int
charPriority c
| 'a' <= c && c <= 'z' = fromEnum c - fromEnum 'a' + 1
| 'A' <= c && c <= 'Z' = fromEnum c - fromEnum 'A' + 27
| otherwise = 0
```

#### Using Sets

Now, how should we compare the items in the first string with the items in the second? One option is to manually iterate over the characters in the first, checking if they are in the second. In the worst case, the common character will be at the end of each string, requiring `O(n^2)`

comparisons. Instead, I chose to first transform the strings into a more efficient data structure, then compare those instead.

- Construct a Set from each String.
`O(log n)`

- Combine the Sets by intersection.
`~O(n)`

assuming similarly sized sets. - The only item in the remaining Set is our answer.
`O(1)`

- Total Runtime:
`O(n)`

In practice, using asymptotic analysis here is questionable. Our input is very limited in size. This approach seems comparable to sorting the strings and then comparing them element-by-element, which is O(n log n) total. It doesn't seem possible to break the O(n) barrier, since you will always have the possibility of needing to look at every item in the list.

Here's the full code:

```
module AOC2022.Day03 (spec) where
import qualified Data.Set as Set
import Input
import Test.Hspec
input :: IO String
input = readDay 2022 3
spec :: IO ()
spec = hspec $ do
describe "Part 1" $ do
it "vJrwpWtwJgWrhcsFMMfFFhFp" $ do
part1 "vJrwpWtwJgWrhcsFMMfFFhFp" `shouldBe` 16
it "runs on custom input" $ do
myInput <- input
sum (fmap part1 (lines myInput)) `shouldBe` 0 -- redacted
standout :: [Char] -> Char
standout s =
let size = length s `div` 2
(c1, c2) = (Set.fromList $ take size s, Set.fromList $ drop size s)
in Set.findMin $ c1 `Set.intersection` c2
charPriority :: Char -> Int
charPriority c
| 'a' <= c && c <= 'z' = fromEnum c - fromEnum 'a' + 1
| 'A' <= c && c <= 'Z' = fromEnum c - fromEnum 'A' + 27
| otherwise = 0
part1 :: String -> Int
part1 = charPriority . standout
```

## Part 2 - Where Sets Pay Off

In Part 2, we no longer need to split the individual lines. Instead, we're looking for the common character in each group of 3 input lines. This is... basically the same thing! Let's factor out our `commonItem`

logic, and generalize it to work on any number of Strings:

```
commonItem :: [String] -> Char
commonItem = Set.findMin . foldr1 Set.intersection . fmap Set.fromList
```

With this factored out, the only difference between Part 1 and Part 2 is how we generate our groups of strings. While Part 1 splits each line into two strings, Part 2 process the input in chunks of 3. Full code:

```
module AOC2022.Day03 (spec) where
import Data.List.Split
import qualified Data.Set as Set
import Input
import Test.Hspec
input :: IO String
input = readDay 2022 3
spec :: IO ()
spec = hspec $ do
describe "Part 1" $ do
it "vJrwpWtwJgWrhcsFMMfFFhFp" $ do
part1 ["vJrwpWtwJgWrhcsFMMfFFhFp"] `shouldBe` 16
it "runs on custom input" $ do
myInput <- input
part1 (lines myInput) `shouldBe` 0 -- redacted
describe "Part 2" $ do
it "runs on custom input" $ do
myInput <- input
part2 (lines myInput) `shouldBe` 0 -- redacted
commonItem :: [String] -> Char
commonItem = Set.findMin . foldr1 Set.intersection . fmap Set.fromList
splitInHalf :: String -> [String]
splitInHalf s = [take size s, drop size s]
where
size = length s `div` 2
charPriority :: Char -> Int
charPriority c
| 'a' <= c && c <= 'z' = fromEnum c - fromEnum 'a' + 1
| 'A' <= c && c <= 'Z' = fromEnum c - fromEnum 'A' + 27
| otherwise = 0
part1 :: [String] -> Int
part1 = sum . fmap (charPriority . commonItem . splitInHalf)
part2 :: [String] -> Int
part2 = sum . fmap (charPriority . commonItem) . chunksOf 3
```

#### Runtime analysis

The runtime of `Set.intersection`

is `O(m log((n+1)/(m+1)))`

. For similarly sized Sets, the `log`

factor approaches a constant, which is why the runtime for Part 1 is O(n). In Part 2, our best case is that the first intersection results in a single item, after which the remaining intersections run with `m=1`

, or `O(log n)`

. In the worse case, however, all but the final intersection share every character, taking the same O(n) time.

The Set from `Data.Set`

is built with balanced binary trees. Another option is `Data.HashSet`

, which would have closer to O(n) runtime for all operations. However, Data.HashSet is missing the `findMin`

function that is so convenient here. While we know our result will contain a single item, HashSets in general are *unordered* containers with fewer options for lookups.

### Advent of Code 2022 Series

This post is part of a series describing my **Haskell** solutions to **Advent of Code 2022**.

See Also: Day 3 Bonus: Split in Half

Next: Day 4: Camp Cleanup Previous: Day 2: Rock Paper Scissors

Cheers!