This project involves data extraction from websites using Selenium for web scraping and manipulation of the extracted data using NumPy and Pandas libraries in Python.
The purpose of this project is to demonstrate how to:
- Use Selenium to automate web browser interactions for data extraction.
- Employ NumPy and Pandas for data manipulation, analysis, and storage.
Ensure you have the following installed:
- Python (3.x recommended)
- Selenium library (
pip install selenium
) - NumPy library (
pip install numpy
) - Pandas library (
pip install pandas
) - WebDriver for your browser (e.g., ChromeDriver for Google Chrome)
-
Clone the repository:
git clone https://github.com/yash3004/extraction_data-selenium-/
-
Install the required libraries:
pip install -r requirements.txt
-
Download and place the WebDriver for your browser in the project directory.
-
Customize the Selenium scripts (
extract_data.py
) to target the desired website(s) and data. -
Run the data extraction script:
python voyalla.py
-
The extracted data will be stored in NumPy arrays or Pandas DataFrames based on your script configuration.
extract_data.py
: Contains the Selenium code for web scraping and data extraction.data_analysis.py
: Demonstrates data manipulation, analysis, and storage using NumPy and Pandas.
- Use
voyalla.py
to extract tabular data from a website and store it in a Pandas DataFrame. - Utilize
cleaning.py
to perform various data manipulations, calculations, or analyses on the extracted data.
Contributions are welcome! Feel free to open issues or pull requests for improvements, bug fixes, or additional features.
This project is licensed under the MIT License.