-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] 베스트 셀러 크롤링 기능 추가 #42
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,10 @@ | ||
package com.jisungin.infra.crawler; | ||
|
||
import java.util.Map; | ||
|
||
public interface Crawler { | ||
|
||
CrawlingBook crawlBook(String isbn); | ||
Map<Long, CrawlingBook> crawlBestSellerBook(); | ||
|
||
} |
37 changes: 32 additions & 5 deletions
37
src/main/java/com/jisungin/infra/crawler/CrawlingBook.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,56 @@ | ||
package com.jisungin.infra.crawler; | ||
|
||
import java.time.LocalDateTime; | ||
import lombok.Builder; | ||
import lombok.Getter; | ||
import lombok.ToString; | ||
|
||
@Getter | ||
@ToString | ||
public class CrawlingBook { | ||
|
||
private String imageUrl; | ||
private String title; | ||
private String content; | ||
private String isbn; | ||
private String publisher; | ||
private String imageUrl; | ||
private String thumbnail; | ||
private String[] authors; | ||
private LocalDateTime dateTime; | ||
|
||
@Builder | ||
private CrawlingBook(String imageUrl, String content) { | ||
this.imageUrl = imageUrl; | ||
private CrawlingBook(String title, String content, String isbn, String publisher, String imageUrl, String thumbnail, | ||
String authors, LocalDateTime dateTime) { | ||
this.title = title; | ||
this.content = content; | ||
this.isbn = isbn; | ||
this.publisher = publisher; | ||
this.imageUrl = imageUrl; | ||
this.thumbnail = thumbnail; | ||
this.authors = parseAuthorsToArr(authors); | ||
this.dateTime = dateTime; | ||
} | ||
|
||
public static CrawlingBook of(String imageUrl, String content) { | ||
public static CrawlingBook of(String title, String content, String isbn, String publisher, String imageUrl, | ||
String thumbnail, String authors, LocalDateTime dateTime) { | ||
return CrawlingBook.builder() | ||
.imageUrl(imageUrl) | ||
.title(title) | ||
.content(content) | ||
.isbn(isbn) | ||
.publisher(publisher) | ||
.imageUrl(imageUrl) | ||
.thumbnail(thumbnail) | ||
.authors(authors) | ||
.dateTime(dateTime) | ||
.build(); | ||
} | ||
|
||
public boolean isBlankContent() { | ||
return this.content.isBlank(); | ||
} | ||
|
||
private String[] parseAuthorsToArr(String authors) { | ||
return authors.split(" 저| 공저| 글| 편저| 원저")[0].split(","); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,12 @@ | ||
package com.jisungin.infra.crawler; | ||
|
||
import java.util.Map; | ||
import org.jsoup.nodes.Document; | ||
|
||
public interface Parser { | ||
|
||
String parseIsbn(Document doc); | ||
CrawlingBook parseBook(Document doc); | ||
Map<Long, String> parseBestSellerBookId(Document doc); | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
23 changes: 0 additions & 23 deletions
23
src/main/java/com/jisungin/infra/crawler/Yes24CrawlerConstant.java
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
69 changes: 59 additions & 10 deletions
69
src/main/java/com/jisungin/infra/crawler/Yes24Parser.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,29 +1,78 @@ | ||
package com.jisungin.infra.crawler; | ||
|
||
import static com.jisungin.infra.crawler.Yes24CrawlerConstant.BOOK_CONTENT_CSS; | ||
import static com.jisungin.infra.crawler.Yes24CrawlerConstant.BOOK_IMAGE_ATTR; | ||
import static com.jisungin.infra.crawler.Yes24CrawlerConstant.BOOK_IMAGE_CSS; | ||
import static com.jisungin.infra.crawler.Yes24CrawlerConstant.ISBN_ATTR; | ||
import static com.jisungin.infra.crawler.Yes24CrawlerConstant.ISBN_CSS; | ||
|
||
import com.jayway.jsonpath.JsonPath; | ||
import java.time.LocalDate; | ||
import java.time.LocalDateTime; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.stream.Collectors; | ||
import java.util.stream.IntStream; | ||
import lombok.Setter; | ||
import org.jsoup.Jsoup; | ||
import org.jsoup.nodes.Document; | ||
import org.jsoup.nodes.Element; | ||
import org.jsoup.safety.Safelist; | ||
import org.jsoup.select.Elements; | ||
import org.springframework.boot.context.properties.ConfigurationProperties; | ||
import org.springframework.stereotype.Component; | ||
|
||
@Component | ||
@Setter | ||
@ConfigurationProperties(prefix = "crawler.yes24.parser") | ||
public class Yes24Parser implements Parser { | ||
|
||
private String isbnCss; | ||
private String isbnAttr; | ||
private String bookContentCss; | ||
private String bookJsonCss; | ||
private String bestRankingCss; | ||
private String bestIdCss; | ||
private String bestIdAttrs; | ||
|
||
@Override | ||
public String parseIsbn(Document doc) { | ||
return doc.select(ISBN_CSS).attr(ISBN_ATTR); | ||
return doc.select(isbnCss).attr(isbnAttr); | ||
} | ||
|
||
@Override | ||
public CrawlingBook parseBook(Document doc) { | ||
String image = doc.select(BOOK_IMAGE_CSS).attr(BOOK_IMAGE_ATTR); | ||
String content = Jsoup.clean(doc.select(BOOK_CONTENT_CSS).text(), Safelist.none()); | ||
String json = doc.select(bookJsonCss).html(); | ||
|
||
String title = parseJsonToString(json, "$.name"); | ||
String isbn = parseJsonToString(json, "$.workExample[0].isbn"); | ||
String imageUrl = parseJsonToString(json, "$.image"); | ||
String publisher = parseJsonToString(json, "$.publisher.name"); | ||
String authors = parseJsonToString(json, "$.author.name"); | ||
String thumbnail = imageUrl.replace("XL", "M"); | ||
String content = Jsoup.clean(doc.select(bookContentCss).text(), Safelist.none()); | ||
LocalDateTime dateTime = parseDate(parseJsonToString(json, "$.workExample[0].datePublished")); | ||
|
||
return CrawlingBook.of(title, content, isbn, publisher, imageUrl, thumbnail, authors, dateTime); | ||
} | ||
|
||
@Override | ||
public Map<Long, String> parseBestSellerBookId(Document doc) { | ||
Elements rankings = doc.select(bestRankingCss); | ||
List<String> bookIds = doc.select(bestIdCss) | ||
.eachAttr(bestIdAttrs); | ||
|
||
return IntStream.range(0, rankings.size()) | ||
.boxed() | ||
.collect(Collectors.toMap( | ||
i -> parseRanking(rankings.get(i)), | ||
bookIds::get)); | ||
} | ||
|
||
private Long parseRanking(Element rankingElement) { | ||
return Long.parseLong(rankingElement.text()); | ||
} | ||
|
||
private String parseJsonToString(String json, String path) { | ||
return JsonPath.read(json, path); | ||
} | ||
|
||
return CrawlingBook.of(image, content); | ||
private LocalDateTime parseDate(String dateString) { | ||
return LocalDate.parse(dateString).atStartOfDay(); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,4 +10,5 @@ spring: | |
prod-env: | ||
- prod | ||
include: | ||
oauth | ||
- oauth | ||
- crawler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
크롤링을 위한 설정값을 바인딩 하기 위해
@Setter
를 쓰신 건가요 ??다른 방법은 없을까요 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
네 맞습니다.
@ConfigurationProperties
는 yaml 파일에 있는 값을 동적으로 변경할 수 있습니다. 그렇기 때문에 생성된 빈에 대해서는@Setter
를 통해서yaml
파일의 값을 가져야 설정할 수 있습니다.다른 방법이라 하면
@Value
를 통해서 값을 주입할 수 있지만 필드 하나마다 경로를 설정해야 하는 단점이 있지만 주입받는 필드를final
로 설정할 수 있다는 장점이 있습니다.현재는 주입받는 필드가 많아
@ConfigurationProperties
를 사용하였습니다.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
오 그렇군요 ! 감사합니다.