Working With Text
- I'm really frugal, (cheap), where is the lowest price divisional football game?
- Please start by downloading this workbook
- This is from FiveThirtyEight's GitHub data set.
- The entire file is not usable at this time.
- We will need to split columns and strings.
- And will need to turn strings into integers.
- A term: string
- A string is a collection of characters
- It can include letters (a-z, A-Z)
- or digits (0-9)
- Or special characters.
- Words are strings, but strings are not necessarily words.
- Start by splitting column a on commas.
- Notice the title line (row 1) has text that includes a comma
-
- But this is inside quotation marks.
- Notice on the second step of the Text to Columns the "Text qualifier" box
-
- Make sure that this is set to "
- But move it to none and see what excel would do.
-
- This is how CSV files encode text that contains a comma.
- Format the Average ticket price as dollars
- We might want to investigate the region, so Split the division into conference and region.
-
- Take a look at the data in column a
-
- There is the away team.
- The word at
- The home team
- Tickets on
- A date
- A number in parenthesis. I have no idea what this is.
- In order to do any work, I will want columns containing
- Home and away team
- Month and day of month.
- Split on the word "at"
- Text to columns only splits on a single character.
- So I need something more.
- We will use a combination of text functions
- find
- Locates a substring in the middle of a string
- Three parameters, two required
- The substring I am looking for
- The string in which I am looking for it.
- By default excel starts searching at position 1.
- If we were looking for multiple occurrences of a letter, we can use the third parameter, start position.
- If nothing is found, the function returns an error.
- Make a new worksheet
- In column a put Hello WOLRD!
- Find "H",
- Find "L"
- Find "l"
- Find the second time "l" occurs
- Find the !
- Find "ell"
- Find "dan"
- left, right, mid
- These functions extract a subsstring from a string.
- They all take a string.
- Left and right take a length.
- mid takes a length and a starting position.
- All three return a string.
- Find the left and right 1-10 letters of "Hello World!"
- Find the mid starting at 2, for length 1-5 of "Hello World!"
- trim
- Takes a string
- Removes
- Leading and trailing spaces
- Extra paces in the middle
- Trim " Hello World! "
- Clean
- Removes non-printable spaces from a string.
- Back to work.
- General plan for splitting column A
- Do a left computation to grab the away team
- Find "at" as length to split.
- Do a mid computation on the string to find the home team.
- Find "Tickets on"
- Find the length of the substring
- Do a mid computation on the string to find the date
- Compute positions
- Add a new row at the top
- And three new columns between A and B at the top.
- Merge the top B1:D1
- Add labels
-
- Make sure you insert " at ", " Tickets on " and " (" in the cells in row 2.
- Don't use the quotation marks
- Do include the space before and after
- This will be important in a second.
- Use the find function to find the starting point of each of these in the correspond value in column A.
-
-
- Find the away team
- Insert a column between D and E
- Label it Away Team
- Use the left function to locate the team name
- I wrapped this inside of trim just to be sure I don't have extra spaces.
-
-
- Find the home team
- Insert a new column between E and F
- Label it Home Team
- This is a mid computation.
- Try =mid(a3, b3, c3)
- What went wrong?
- Use len to aid you with the starting position.
- Now work on the length
- Perhaps subtract off the position of at ?
- A len will be useful here too!
-
-
- Use this technique to extract the date from column A
- Finally splitting the month and date.
- Excel refuses to let us use text to columns on the new column G
- So we need another Left/Mid computation to extract the day and month.
- Do this in new columns H and I
- Also, use =int to change column H to integers
-
- I would like a list of unique team names
- Unfortunately the advanced filer only works on one column
- What happens if we have the unlikely event where a team is only on the road or only at home?
- Use the advanced filter to find the unique home and away teams in column n
- Use the Remove Duplicates button on the Data tab in the Data Tools workgroup to eliminate the duplicates
-