Skip to content

Chapter 11 Regex: typo in the code example. Top-level domain validation expression #277

Open
@nikolayrantsev

Description

@nikolayrantsev

Hello,
Thanks, everyone so much for this course!

  1. Found small type in the 11th chapter, section 'Extracting data using regular expressions':
    ...
    Here is our new regular expression:
    [a-zA-Z0-9]\S*@\S*[a-zA-Z]
    ... then the code block with the usage of this example:
    If we use this expression in our program, our data is much cleaner:
# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
    if len(x) > 0:
        print(x)

# Code: http://www.py4e.com/code3/re07.py

please update the "+" sign in the line x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line) with the "*"

  1. Interesting thing here is that by running the code with the correct expression [a-zA-Z0-9]\S*@\S*[a-zA-Z], we're receiving the results including lines like:
    [ 'dhorwitz@david-horwitz-6:~/branchManagemnt/sakai_2-5-x']

Appreciate the explanation of how to improve the expression in order to filter out the records not matching email address criteria to have a top-level domain.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions