Skip to content

Commit b7330cb

Browse files
authored
Merge pull request #25 from ScrapeGraphAI/async-change
Async change
2 parents d03b9bf + 49b8e4b commit b7330cb

25 files changed

+7241
-177
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
.env
2+
# Ignore .DS_Store files anywhere in the repository
3+
.DS_Store
4+
**/.DS_Store
5+
*.csv

README.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,21 @@
33
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
44
[![Python SDK](https://img.shields.io/badge/Python_SDK-Latest-blue)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py)
55
[![JavaScript SDK](https://img.shields.io/badge/JavaScript_SDK-Latest-yellow)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js)
6-
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://scrapegraphai.com/docs)
6+
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://docs.scrapegraphai.com)
7+
8+
<p align="left">
9+
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
10+
</p>
711

812
Official SDKs for the ScrapeGraph AI API - Intelligent web scraping powered by AI. Extract structured data from any webpage with natural language prompts.
913

10-
The credits can be bougth [here](https://scrapegraphai.com)!
14+
Get your [API key](https://scrapegraphai.com)!
1115

1216
## 🚀 Quick Links
1317

1418
- [Python SDK Documentation](scrapegraph-py/README.md)
1519
- [JavaScript SDK Documentation](scrapegraph-js/README.md)
16-
- [API Documentation](https://scrapegraphai.com/docs)
20+
- [API Documentation](https://docs.scrapegraphai.com)
1721
- [Website](https://scrapegraphai.com)
1822

1923
## 📦 Installation
@@ -69,7 +73,7 @@ Extract information from a local HTML file using AI.
6973
For detailed documentation and examples, visit:
7074
- [Python SDK Guide](scrapegraph-py/README.md)
7175
- [JavaScript SDK Guide](scrapegraph-js/README.md)
72-
- [API Documentation](https://scrapegraphai.com/docs)
76+
- [API Documentation](https://docs.scrapegraphai.com)
7377

7478
## 💬 Support & Feedback
7579

cookbook/chat-webpage-simple-rag/scrapegraph_burr_lancedb.ipynb

+687
Large diffs are not rendered by default.

cookbook/company-info/scrapegraph_langchain.ipynb

+1
Large diffs are not rendered by default.

cookbook/company-info/scrapegraph_llama_index.ipynb

+1,807
Large diffs are not rendered by default.

cookbook/company-info/scrapegraph_sdk.ipynb

+1
Large diffs are not rendered by default.

cookbook/github-trending/scrapegraph_langchain.ipynb

+1
Large diffs are not rendered by default.

cookbook/github-trending/scrapegraph_llama_index.ipynb

+999
Large diffs are not rendered by default.

cookbook/github-trending/scrapegraph_sdk.ipynb

+1
Large diffs are not rendered by default.

cookbook/homes-forsale/scrapegraph_langchain.ipynb

+1
Large diffs are not rendered by default.

cookbook/homes-forsale/scrapegraph_llama_index.ipynb

+799
Large diffs are not rendered by default.

cookbook/homes-forsale/scrapegraph_sdk.ipynb

+1
Large diffs are not rendered by default.

cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb

+1,302
Large diffs are not rendered by default.

cookbook/wired-news/scrapegraph_langchain.ipynb

+1
Large diffs are not rendered by default.

cookbook/wired-news/scrapegraph_langgraph.ipynb

+1
Large diffs are not rendered by default.

cookbook/wired-news/scrapegraph_llama_index.ipynb

+1,438
Large diffs are not rendered by default.

cookbook/wired-news/scrapegraph_sdk.ipynb

+1
Large diffs are not rendered by default.

scrapegraph-js/README.md

+6-5
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
# 🌐 ScrapeGraph JavaScript SDK
22

3-
[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js)
4-
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
5-
[![Build Status](https://github.com/ScrapeGraphAI/scrapegraph-sdk/actions/workflows/ci.yml/badge.svg)](https://github.com/ScrapeGraphAI/scrapegraph-sdk/actions)
6-
[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.scrapegraphai.com)
3+
[![npm version](https://badge.fury.io/js/scrapegraph-js.svg)](https://badge.fury.io/js/scrapegraph-js) [![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) [![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://docs.scrapegraphai.com)
4+
5+
<p align="left">
6+
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
7+
</p>
78

89
Official JavaScript/TypeScript SDK for the ScrapeGraph AI API - Smart web scraping powered by AI.
910

@@ -246,7 +247,7 @@ Contributions are welcome! Please feel free to submit a Pull Request. For major
246247
## 🔗 Links
247248

248249
- [Website](https://scrapegraphai.com)
249-
- [Documentation](https://scrapegraphai.com/documentation)
250+
- [Documentation](https://docs.scrapegraphai.com)
250251
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
251252

252253
## 💬 Support

scrapegraph-js/cookbook/README.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## 📚 Official Cookbook
2+
3+
Looking for examples and guides? Then head over to the official ScrapeGraph SDK [Cookbook](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/cookbook)!
4+
5+
The cookbook provides step-by-step instructions, practical examples, and tips to help you get started and make the most out of ScrapeGraph SDK.
6+
7+
You will find some colab notebooks with our partners as well, including Langchain 🦜 and LlamaIndex 🦙
8+
9+
Happy scraping! 🚀

scrapegraph-py/README.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,13 @@
44
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
55
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
66
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
7-
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://scrapegraph-py.readthedocs.io/en/latest/?badge=latest)
7+
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)
88

9-
Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
9+
<p align="left">
10+
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
11+
</p>
12+
13+
Official [Python SDK ](https://scrapegraphai.com) for the ScrapeGraph API - Smart web scraping powered by AI.
1014

1115
## 📦 Installation
1216

@@ -142,7 +146,7 @@ asyncio.run(main())
142146

143147
## 📖 Documentation
144148

145-
For detailed documentation, visit [scrapegraphai.com/docs](https://scrapegraphai.com/docs)
149+
For detailed documentation, visit [docs.scrapegraphai.com](https://docs.scrapegraphai.com)
146150

147151
## 🛠️ Development
148152

@@ -173,7 +177,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
173177
## 🔗 Links
174178

175179
- [Website](https://scrapegraphai.com)
176-
- [Documentation](https://scrapegraphai.com/docs)
180+
- [Documentation](https://docs.scrapegraphai.com)
177181
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)
178182

179183
---

scrapegraph-py/cookbook/README.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## 📚 Official Cookbook
2+
3+
Looking for examples and guides? Then head over to the official ScrapeGraph SDK [Cookbook](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/cookbook)!
4+
5+
The cookbook provides step-by-step instructions, practical examples, and tips to help you get started and make the most out of ScrapeGraph SDK.
6+
7+
You will find some colab notebooks with our partners as well, such as Langchain 🦜 and LlamaIndex 🦙
8+
9+
Happy scraping! 🚀

scrapegraph-py/pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "scrapegraph_py"
3-
version = "0.0.3"
3+
version = "1.8.0"
44
description = "ScrapeGraph Python SDK for API"
55
authors = [
66
{ name = "Marco Vinciguerra", email = "[email protected]" },

scrapegraph-py/scrapegraph_py/async_client.py

+81-90
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import asyncio
22
from typing import Any, Optional
33

4-
import aiohttp
54
from aiohttp import ClientSession, ClientTimeout, TCPConnector
65
from aiohttp.client_exceptions import ClientError
76
from pydantic import BaseModel
@@ -27,15 +26,15 @@ class AsyncClient:
2726
def from_env(
2827
cls,
2928
verify_ssl: bool = True,
30-
timeout: float = 120,
29+
timeout: Optional[float] = None,
3130
max_retries: int = 3,
3231
retry_delay: float = 1.0,
3332
):
3433
"""Initialize AsyncClient using API key from environment variable.
3534
3635
Args:
3736
verify_ssl: Whether to verify SSL certificates
38-
timeout: Request timeout in seconds
37+
timeout: Request timeout in seconds. None means no timeout (infinite)
3938
max_retries: Maximum number of retry attempts
4039
retry_delay: Delay between retries in seconds
4140
"""
@@ -56,7 +55,7 @@ def __init__(
5655
self,
5756
api_key: str = None,
5857
verify_ssl: bool = True,
59-
timeout: float = 120,
58+
timeout: Optional[float] = None,
6059
max_retries: int = 3,
6160
retry_delay: float = 1.0,
6261
):
@@ -65,7 +64,7 @@ def __init__(
6564
Args:
6665
api_key: API key for authentication. If None, will try to load from environment
6766
verify_ssl: Whether to verify SSL certificates
68-
timeout: Request timeout in seconds
67+
timeout: Request timeout in seconds. None means no timeout (infinite)
6968
max_retries: Maximum number of retry attempts
7069
retry_delay: Delay between retries in seconds
7170
"""
@@ -91,7 +90,7 @@ def __init__(
9190
self.retry_delay = retry_delay
9291

9392
ssl = None if verify_ssl else False
94-
self.timeout = ClientTimeout(total=timeout)
93+
self.timeout = ClientTimeout(total=timeout) if timeout is not None else None
9594

9695
self.session = ClientSession(
9796
headers=self.headers, connector=TCPConnector(ssl=ssl), timeout=self.timeout
@@ -137,6 +136,33 @@ async def _make_request(self, method: str, url: str, **kwargs) -> Any:
137136
logger.info(f"⏳ Waiting {retry_delay}s before retry {attempt + 2}")
138137
await asyncio.sleep(retry_delay)
139138

139+
async def markdownify(self, website_url: str):
140+
"""Send a markdownify request"""
141+
logger.info(f"🔍 Starting markdownify request for {website_url}")
142+
143+
request = MarkdownifyRequest(website_url=website_url)
144+
logger.debug("✅ Request validation passed")
145+
146+
result = await self._make_request(
147+
"POST", f"{API_BASE_URL}/markdownify", json=request.model_dump()
148+
)
149+
logger.info("✨ Markdownify request completed successfully")
150+
return result
151+
152+
async def get_markdownify(self, request_id: str):
153+
"""Get the result of a previous markdownify request"""
154+
logger.info(f"🔍 Fetching markdownify result for request {request_id}")
155+
156+
# Validate input using Pydantic model
157+
GetMarkdownifyRequest(request_id=request_id)
158+
logger.debug("✅ Request ID validation passed")
159+
160+
result = await self._make_request(
161+
"GET", f"{API_BASE_URL}/markdownify/{request_id}"
162+
)
163+
logger.info(f"✨ Successfully retrieved result for request {request_id}")
164+
return result
165+
140166
async def smartscraper(
141167
self,
142168
website_url: str,
@@ -154,17 +180,11 @@ async def smartscraper(
154180
)
155181
logger.debug("✅ Request validation passed")
156182

157-
try:
158-
async with self.session.post(
159-
f"{API_BASE_URL}/smartscraper", json=request.model_dump()
160-
) as response:
161-
response.raise_for_status()
162-
result = await handle_async_response(response)
163-
logger.info("✨ Smartscraper request completed successfully")
164-
return result
165-
except aiohttp.ClientError as e:
166-
logger.error(f"❌ Smartscraper request failed: {str(e)}")
167-
raise ConnectionError(f"Failed to connect to API: {str(e)}")
183+
result = await self._make_request(
184+
"POST", f"{API_BASE_URL}/smartscraper", json=request.model_dump()
185+
)
186+
logger.info("✨ Smartscraper request completed successfully")
187+
return result
168188

169189
async def get_smartscraper(self, request_id: str):
170190
"""Get the result of a previous smartscraper request"""
@@ -174,80 +194,8 @@ async def get_smartscraper(self, request_id: str):
174194
GetSmartScraperRequest(request_id=request_id)
175195
logger.debug("✅ Request ID validation passed")
176196

177-
async with self.session.get(
178-
f"{API_BASE_URL}/smartscraper/{request_id}"
179-
) as response:
180-
result = await handle_async_response(response)
181-
logger.info(f"✨ Successfully retrieved result for request {request_id}")
182-
return result
183-
184-
async def get_credits(self):
185-
"""Get credits information"""
186-
logger.info("💳 Fetching credits information")
187-
188-
async with self.session.get(
189-
f"{API_BASE_URL}/credits",
190-
) as response:
191-
result = await handle_async_response(response)
192-
logger.info(
193-
f"✨ Credits info retrieved: {result.get('remaining_credits')} credits remaining"
194-
)
195-
return result
196-
197-
async def submit_feedback(
198-
self, request_id: str, rating: int, feedback_text: Optional[str] = None
199-
):
200-
"""Submit feedback for a request"""
201-
logger.info(f"📝 Submitting feedback for request {request_id}")
202-
logger.debug(f"⭐ Rating: {rating}, Feedback: {feedback_text}")
203-
204-
feedback = FeedbackRequest(
205-
request_id=request_id, rating=rating, feedback_text=feedback_text
206-
)
207-
logger.debug("✅ Feedback validation passed")
208-
209-
async with self.session.post(
210-
f"{API_BASE_URL}/feedback", json=feedback.model_dump()
211-
) as response:
212-
result = await handle_async_response(response)
213-
logger.info("✨ Feedback submitted successfully")
214-
return result
215-
216-
async def close(self):
217-
"""Close the session to free up resources"""
218-
logger.info("🔒 Closing AsyncClient session")
219-
await self.session.close()
220-
logger.debug("✅ Session closed successfully")
221-
222-
async def __aenter__(self):
223-
return self
224-
225-
async def __aexit__(self, exc_type, exc_val, exc_tb):
226-
await self.close()
227-
228-
async def markdownify(self, website_url: str):
229-
"""Send a markdownify request"""
230-
logger.info(f"🔍 Starting markdownify request for {website_url}")
231-
232-
request = MarkdownifyRequest(website_url=website_url)
233-
logger.debug("✅ Request validation passed")
234-
235-
result = await self._make_request(
236-
"POST", f"{API_BASE_URL}/markdownify", json=request.model_dump()
237-
)
238-
logger.info("✨ Markdownify request completed successfully")
239-
return result
240-
241-
async def get_markdownify(self, request_id: str):
242-
"""Get the result of a previous markdownify request"""
243-
logger.info(f"🔍 Fetching markdownify result for request {request_id}")
244-
245-
# Validate input using Pydantic model
246-
GetMarkdownifyRequest(request_id=request_id)
247-
logger.debug("✅ Request ID validation passed")
248-
249197
result = await self._make_request(
250-
"GET", f"{API_BASE_URL}/markdownify/{request_id}"
198+
"GET", f"{API_BASE_URL}/smartscraper/{request_id}"
251199
)
252200
logger.info(f"✨ Successfully retrieved result for request {request_id}")
253201
return result
@@ -288,3 +236,46 @@ async def get_localscraper(self, request_id: str):
288236
)
289237
logger.info(f"✨ Successfully retrieved result for request {request_id}")
290238
return result
239+
240+
async def submit_feedback(
241+
self, request_id: str, rating: int, feedback_text: Optional[str] = None
242+
):
243+
"""Submit feedback for a request"""
244+
logger.info(f"📝 Submitting feedback for request {request_id}")
245+
logger.debug(f"⭐ Rating: {rating}, Feedback: {feedback_text}")
246+
247+
feedback = FeedbackRequest(
248+
request_id=request_id, rating=rating, feedback_text=feedback_text
249+
)
250+
logger.debug("✅ Feedback validation passed")
251+
252+
result = await self._make_request(
253+
"POST", f"{API_BASE_URL}/feedback", json=feedback.model_dump()
254+
)
255+
logger.info("✨ Feedback submitted successfully")
256+
return result
257+
258+
async def get_credits(self):
259+
"""Get credits information"""
260+
logger.info("💳 Fetching credits information")
261+
262+
result = await self._make_request(
263+
"GET",
264+
f"{API_BASE_URL}/credits",
265+
)
266+
logger.info(
267+
f"✨ Credits info retrieved: {result.get('remaining_credits')} credits remaining"
268+
)
269+
return result
270+
271+
async def close(self):
272+
"""Close the session to free up resources"""
273+
logger.info("🔒 Closing AsyncClient session")
274+
await self.session.close()
275+
logger.debug("✅ Session closed successfully")
276+
277+
async def __aenter__(self):
278+
return self
279+
280+
async def __aexit__(self, exc_type, exc_val, exc_tb):
281+
await self.close()

0 commit comments

Comments
 (0)