getting empty data #169

ihabpalamino · 2023-07-12T15:31:01Z

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from Scweet.scweet import scrape

Specify the parameters for scraping

username = "2MInteractive"
since_date = "2023-07-01"
until_date = "2023-07-11"
headless = True

Set up the ChromeDriver service

service = Service("C:/Users/HP Probook/Downloads/chromedriver.exe") # Replace with the actual path to chromedriver

Set up the ChromeOptions

options = webdriver.ChromeOptions()
options.headless = headless

Create the WebDriver

driver = webdriver.Chrome(service=service, options=options)

Scrape the tweets by username

data = scrape(from_account=username, since=since_date, until=until_date, headless=headless, driver=driver)

Print the scraped data

print(data)

Close the WebDriver

driver.quit()

getting empty data "C:\Users\HP Probook\PycharmProjects\firstproject\venv\Scripts\python.exe" "C:/Users/HP Probook/PycharmProjects/firstproject/TikTokScrap.py"
looking for tweets between 2023-07-01 and 2023-07-06 ...
path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-06%20since%3A2023-07-01%20&src=typed_query
scroll 1
scroll 2
looking for tweets between 2023-07-06 and 2023-07-11 ...
path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-11%20since%3A2023-07-06%20&src=typed_query
scroll 1
scroll 2
Empty DataFrame
Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL]
Index: []

Process finished with exit code 0

baqachadil · 2023-07-13T10:48:47Z

I have the exact same issue, I can see Selinium searching through Tweets for the specified period, but no data is returned, this is my code:

driver = init_driver(headless=False, show_images=False)

log_in(driver, env=".env")

data = scrape(words=['crypto', 'etheium', 'bitcoin'], hashtag='crypto', since="2023-02-01", until="2023-02-05", from_account=None, interval=1, headless=False, display_type=None, save_images=False, lang="en", resume=False, filter_replies=False, proximity=False, driver=driver)

print(data)

console:
looking for tweets between 2023-2-01 and 2023-02-02 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-02%20since%3A2023-2-01%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
scroll 8
looking for tweets between 2023-02-02 and 2023-02-03 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-03%20since%3A2023-02-02%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
scroll 8
scroll 9
looking for tweets between 2023-02-03 and 2023-02-04 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-04%20since%3A2023-02-03%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
looking for tweets between 2023-02-04 and 2023-02-05 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-05%20since%3A2023-02-04%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
looking for tweets between 2023-02-05 and 2023-02-06 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-06%20since%3A2023-02-05%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
scroll 8
looking for tweets between 2023-02-06 and 2023-02-07 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-07%20since%3A2023-02-06%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
looking for tweets between 2023-02-07 and 2023-02-08 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-08%20since%3A2023-02-07%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
scroll 8
looking for tweets between 2023-02-08 and 2023-02-09 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-09%20since%3A2023-02-08%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
scroll 5
scroll 6
scroll 7
scroll 8
looking for tweets between 2023-02-09 and 2023-02-10 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-10%20since%3A2023-02-09%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
looking for tweets between 2023-02-10 and 2023-02-11 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-11%20since%3A2023-02-10%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-11 and 2023-02-12 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-12%20since%3A2023-02-11%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-12 and 2023-02-13 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-13%20since%3A2023-02-12%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-13 and 2023-02-14 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-14%20since%3A2023-02-13%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-14 and 2023-02-15 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-15%20since%3A2023-02-14%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-15 and 2023-02-16 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-16%20since%3A2023-02-15%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-16 and 2023-02-17 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-17%20since%3A2023-02-16%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-17 and 2023-02-18 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-18%20since%3A2023-02-17%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-18 and 2023-02-19 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-19%20since%3A2023-02-18%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
looking for tweets between 2023-02-19 and 2023-02-20 ...
path : https://twitter.com/search?q=(%23crypto)%20until%3A2023-02-20%20since%3A2023-02-19%20lang%3Aen&src=typed_query
scroll 1
scroll 2
scroll 3
scroll 4
Empty DataFrame
Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL]

baqachadil · 2023-07-18T13:12:27Z

I manage to find the problem with this issue. So first u have to go to the function get_data in 'Scweet\utils.py' and change all instances of

find_element_by_xpath('...') to find_element('xpath', '...') As it is no longer supported for latest versions of Selinium.

The second thing is that you have to check if all the functions that return an element from HTML are actually returning something (it appears that if only one element is null the whole Tweet is considered Null for example if Selinium couldn't find the Username of the Tweet). To do this u have to check all the xpaths if they're correct or not. I will give an example but u should check all of them.

try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[1]').text
except:
        text = ""

should actually be:

try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
except:
        text = ""

in my case I haven't used all that Tweet metadata I've only used the ones I needed and checked if their xpath is correct. there's the final code of the get_data() method:

def get_data(card, save_images=False, save_dir=None):
    """Extract data from tweet card"""
    image_links = []

    try:
        username = card.find_element('xpath','.//span').text
    except:
        return

    try:
        handle = card.find_element('xpath','.//span[contains(text(), "@")]').text
    except:
        return

    try:
        postdate = card.find_element('xpath','.//time').get_attribute('datetime')
    except:
        return

    try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
    except:
        text = ""

    try:
        embedded = card.find_element('xpath','.//div[2]/div[2]/div[2]').text
    except:
        embedded = ""

    # tweet url
    try:
        element = card.find_element('xpath','.//div/div/div[2]/div[2]/div[1]/div/div[1]/div/div/div[2]/div/div[3]/a')
        tweet_url = element.get_attribute('href')
    except:
        return


    tweet = (
        username, handle, postdate, text, embedded, tweet_url)
    return tweet

ihabpalamino · 2023-07-24T15:06:00Z

and does it work?

I manage to find the problem with this issue. So first u have to go to the function get_data in 'Scweet\utils.py' and change all instances of

find_element_by_xpath('...') to find_element('xpath', '...') As it is no longer supported for latest versions of Selinium.

The second thing is that you have to check if all the functions that return an element from HTML are actually returning something (it appears that if only one element is null the whole Tweet is considered Null for example if Selinium couldn't find the Username of the Tweet). To do this u have to check all the xpaths if they're correct or not. I will give an example but u should check all of them.
try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[1]').text
except:
        text = ""
should actually be:
try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
except:
        text = ""
in my case I haven't used all that Tweet metadata I've only used the ones I needed and checked if their xpath is correct. there's the final code of the get_data() method:
def get_data(card, save_images=False, save_dir=None):
    """Extract data from tweet card"""
    image_links = []

    try:
        username = card.find_element('xpath','.//span').text
    except:
        return

    try:
        handle = card.find_element('xpath','.//span[contains(text(), "@")]').text
    except:
        return

    try:
        postdate = card.find_element('xpath','.//time').get_attribute('datetime')
    except:
        return

    try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
    except:
        text = ""

    try:
        embedded = card.find_element('xpath','.//div[2]/div[2]/div[2]').text
    except:
        embedded = ""

    # tweet url
    try:
        element = card.find_element('xpath','.//div/div/div[2]/div[2]/div[1]/div/div[1]/div/div/div[2]/div/div[3]/a')
        tweet_url = element.get_attribute('href')
    except:
        return


    tweet = (
        username, handle, postdate, text, embedded, tweet_url)
    return tweet

wdj1995 · 2023-09-12T08:41:28Z

I manage to find the problem with this issue. So first u have to go to the function get_data in 'Scweet\utils.py' and change all instances of

find_element_by_xpath('...') to find_element('xpath', '...') As it is no longer supported for latest versions of Selinium.

The second thing is that you have to check if all the functions that return an element from HTML are actually returning something (it appears that if only one element is null the whole Tweet is considered Null for example if Selinium couldn't find the Username of the Tweet). To do this u have to check all the xpaths if they're correct or not. I will give an example but u should check all of them.
try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[1]').text
except:
        text = ""
should actually be:
try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
except:
        text = ""
in my case I haven't used all that Tweet metadata I've only used the ones I needed and checked if their xpath is correct. there's the final code of the get_data() method:
def get_data(card, save_images=False, save_dir=None):
    """Extract data from tweet card"""
    image_links = []

    try:
        username = card.find_element('xpath','.//span').text
    except:
        return

    try:
        handle = card.find_element('xpath','.//span[contains(text(), "@")]').text
    except:
        return

    try:
        postdate = card.find_element('xpath','.//time').get_attribute('datetime')
    except:
        return

    try:
        text = card.find_element('xpath','.//div[2]/div[2]/div[2]/div').text
    except:
        text = ""

    try:
        embedded = card.find_element('xpath','.//div[2]/div[2]/div[2]').text
    except:
        embedded = ""

    # tweet url
    try:
        element = card.find_element('xpath','.//div/div/div[2]/div[2]/div[1]/div/div[1]/div/div/div[2]/div/div[3]/a')
        tweet_url = element.get_attribute('href')
    except:
        return


    tweet = (
        username, handle, postdate, text, embedded, tweet_url)
    return tweet

Thanks for your great work!
But when I use your code, I found a question.

When the result is
"reply to @xxxxx
XXXXXXXXXX the embeded text XXXXXXX",

I only got the "reply to @xxxxx",
I could not get the real embeded text!
Could you help me?

ihabpalamino changed the title ~~UnboundLocalError: local variable 'driver' referenced before assignment~~ getting empty data Jul 12, 2023

This was referenced Jul 18, 2023

Scrape can't get anything #165

Open

Scrape not working? #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting empty data #169

getting empty data #169

ihabpalamino commented Jul 12, 2023 •

edited

Loading

baqachadil commented Jul 13, 2023 •

edited

Loading

baqachadil commented Jul 18, 2023 •

edited

Loading

ihabpalamino commented Jul 24, 2023

wdj1995 commented Sep 12, 2023

getting empty data #169

getting empty data #169

Comments

ihabpalamino commented Jul 12, 2023 • edited Loading

Specify the parameters for scraping

Set up the ChromeDriver service

Set up the ChromeOptions

Create the WebDriver

Scrape the tweets by username

Print the scraped data

Close the WebDriver

baqachadil commented Jul 13, 2023 • edited Loading

baqachadil commented Jul 18, 2023 • edited Loading

ihabpalamino commented Jul 24, 2023

wdj1995 commented Sep 12, 2023

ihabpalamino commented Jul 12, 2023 •

edited

Loading

baqachadil commented Jul 13, 2023 •

edited

Loading

baqachadil commented Jul 18, 2023 •

edited

Loading