Categories
Programming

Problem-Solving Patterns: Parsing

When you solve enough problems, you start to run into patterns of logic. Every problem has many different ways to solve it, but if you learn some ways to think about the logic behind the problem, you will see connections that will help you solve new problems.

For a beginning programmer like me, it is tempting to want to look for a method that will solve the problem in one line. However, it is important to understand the logic behind the methods – especially since working with these kinds of problems is new to me. Then, once you see a problem that is slightly different but has a similar logical pattern to it, you can solve it more creatively. Launch School places so much emphasis on mastery-based, deep learning because the real world does not have magic methods. Flexibility and creativity are key in dealing with the idiosyncrasies of life, and also to keep up with the ever-changing languages and systems in technology.

Here is an in-depth explanation of a logical pattern I have found useful in solving problems that have to do with parsing, especially string parsing.

Parsing pattern

If asked to find the first instance of an element with a particular quality, but that element is buried in the middle of a collection with repeated instances of other elements that may or may not have that same quality, how do you make sure you are accessing the correct data from that collection?

For example, let’s say you want to find the first longest set of repeated consecutive letters from the following string:

string = 'aabbbcddd'

The string has two sets of repeated consecutive characters: 'bbb' and 'ddd'. How do you make sure to return the first set and not the second, and how do you hold on to this info while still checking the rest of the collection to make sure there aren’t longer sets?

As you are iterating over the collection, try saving these elements by assigning a local variable to them. As you loop/iterate, reassign the local variable to the new element if there is a better match to the particular quality you are looking for (if it’s longer, for example, or appears more often in the string, or is the last instance).

Example

Problem: Write a program that iterates over the strings and outputs the last integer value of each string, storing each integer in an array.

string_array = [
 "Web IconHTML & CSS10%",
 "JQuery IconjQuery55%",
 "Angular JSLearn AngularJS 1.X100%"
]
 
string_parser(string_array) ==  [10, 55, 100]

You have to iterate over the given array to access each string, but this, of course, is not the hardest part of the problem. At the heart of this problem, it asks you to look for the last integer within a string. This would be very easy to do if there was only one integer per string in the array, but in the last string in the given array, we see that there are two numbers separated by other characters, but still part of the same word: "Angular JSLearn AngularJS 1.X100%"

This is the trickiest part of the given test case and the problem we need to solve first. The solution we find for the other parts of this test case will probably not work for this one, so we need to find a solution that works for all of them but keeping this particular case in mind.

In this particular example, can we call this last string object in the given array an edge case? From what I understand of edge cases, they are more about what to do if there is unexpected input, or input that pushes the boundaries of what we expect our method to do. For example, what would you return if one of the objects in the array were an empty string, or a string with no integers whatsoever? Would you add a nil to the result array, or a 0, or nothing at all?

The last string in the given array is actually a more standard kind of input, just a bit trickier than the other string objects in the array argument that only have one set of string integer digits. You need to return an integer in your results array, but how do you access the correct integer? I think this is an important distinction because we often tend to leave edge cases for the end of our problem-solving, but in this case, we would end up solving the problem in a completely different way, and this might distract us from the point of the problem.

We need a way to extract one number without extracting all of the numbers because we only want the last numbers. If we could extract all of the numbers at once, this problem would be simple. However, we must keep '1' separate from '100'. How can we do this? For illustrative purposes, I have extracted this particular problem to a helper method called get_last_num_from_string:

def get_last_num_from_string(str)
  # put code here
end
 
p get_last_num_from_string("Angular JSLearn AngularJS 1.X100%")

In our method definition, we will start by iterating over the elements in the string to find the last set of numbers. We will check if each character is an integer (a string integer, in this case). Every time we come across a number, we will store it in a container, grouped together in sets if there are multiple consecutive numbers.

Here is where we could take at least two different routes. We could iterate forward over the string, or since we are looking for the last number or set of numbers, we could iterate backward over the string. If we iterate backward over the string, we can easily access the first numbers that appear and return those as soon as we reach another character that is not a number.

The forward iteration might be longer, but it is worth explaining. As we are iterating over the values, we can store the string integer(s) into a number container. Let’s call it current_num.

In the string "Angular JSLearn AngularJS 1.X100%", the first number we come across will be '1'. We continue to iterate over the characters in the string, though, because we want to make sure we find the last number(s) in the string. As soon as we come across a character that is not an integer, we can hold on to the values that current_num is currently pointing to by assigning them to another local variable container. We can then reset the current_num container to point to an empty container. We repeat this process every time we come across a new integer or set of integers.

Once we reach the final iteration, we will need to check the current_num container one last time to see if it has something in it. If it does, we should return that, because it is the last number(s). If not, we should return the values in the result container.

With the above algorithm, there are many ways to write out this code, but you can see that my solution is fairly simple: I don’t use Regex, I use string integers to avoid converting the string numbers to a integers every time I check to see if they are numbers, and I use string containers to hold the objects pointed to by result and current_num.

# Forward version
def get_last_num_from_string(string)
  integer_strings= ('0'..'9').to_a
  current_num = ''
  result = ''

  string.chars.each do |char|
    if integer_strings.include?(char)
      current_num << char
    elsif !integer_strings.include?(char) && current_num != ''
      result = current_num
      current_num = ''
    end
  end
  result = current_num if current_num != ''
  result.to_i
end

p get_last_num_from_string("Angular JSLearn AngularJS 1.X100%") # => 100
# Backward version
def get_last_num_from_string(string)
  integer_strings= ('0'..'9').to_a
  current_num = ''

  string.chars.reverse.each do |char|
    if integer_strings.include?(char)
      current_num << char
    else
      return current_num.reverse.to_i if current_num != ''
    end
  end
end

p get_last_num_from_string("Angular JSLearn AngularJS 1.X100%") # => 100

These method definitions make use of local variables to store data outside of the block while iterating over the argument. This way you can compare the things you find in the rest of the iterations to the data you have already selected.

Other Examples

The following problem involves analyzing consecutive parts of an array. It can be solved by joining the given amount of strings together into one string and saving it into a local variable to compare by size to other sets of strings. You will only replace the values stored in the local variable if the current set is longer than the saved set. Try this technique for yourself.

# You are given an array `strarr` of strings and an integer `k`. Your task is to return the first longest string consisting of k consecutive strings taken in the array.

p longest_consec(["zone", "abigail", "theta", "form", "libe", "zas"], 2) == "abigailtheta"
p longest_consec(["zone", "abigail", "theta", "form", "libe", "zas"], 3) == "zoneabigailtheta"

Here is another problem that is best solved by finding and saving the last number before a letter (hint: you can initialize the local variable to a number instead of an empty string like in the previous examples. Think of the letters in a mathy way).

# Simple Simple Simple String Expansion (from Codewars)

# Given a string that includes alphanumeric characters ('3a4B2d') return the expansion of that string: The numeric values represent the occurrence of each letter preceding that numeric value. There should be no numeric characters in the final string. Empty strings should return an empty string.
# The first occurrence of a numeric value should be the number of times each character behind it is repeated, until the next numeric value appears.

p string_expansion('3D2a5d2f') == 'DDDaadddddff'
p string_expansion('3abc') == 'aaabbbccc'       
p string_expansion('3d332f2a') == 'dddffaa'
p string_expansion('abcde') == 'abcde'
p string_expansion('') == ''

This is just one way to solve this kind of problem, but I have found that it is a helpful pattern to be aware of in the future. A note of caution: be careful not to solve a problem from memory alone. However, once you are aware of enough patterns, you can mix them up and reuse them in creative ways. The most important thing is to be aware of the patterns and the variety of ways to solve problems because you will need to consider your options when deciding which tool to use. I hope you start to see patterns in your own problem-solving.

CODEWARS BONUS: Here are more problems that can be solved using this pattern:

Repeated substring

Longest vowel chain

Longest alphabetical string