Skip to content

Commit

Permalink
Refactor sync script (#175)
Browse files Browse the repository at this point in the history
* Refactor dev.to sync script

* Update workflow file

* Prepare classes for http adapters

* Added tests. Changed code to use http adapters

* Update coderabbit ignore rules

* fix coderabbit warnings

* Update bin/sync/images_downloader.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update bin/sync/sync.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update test/integration/sync_with_devto_test.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update test/integration/sync_with_devto_test.rb

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Move retries to module

* chore: fix code style

* sync posts

* run sync

* run sync

* run sync

* run sync

* fix tests

* Skip canonical URL update on dev.to for articles with matching local slug and up-to-date canonical URL

* synced dev.to

---------

Co-authored-by: Dmitry Gorodnichy <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Paul Keen <[email protected]>
  • Loading branch information
4 people authored Nov 13, 2024
1 parent c93b8a5 commit d182950
Show file tree
Hide file tree
Showing 240 changed files with 4,258 additions and 1,999 deletions.
2 changes: 2 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@ early_access: true
reviews:
path_filters:
- "!content/blog/!*.md"
- "!content/blog/sync_status.yml"
- "!test/fixtures/"
- "!wp-content/**"
- "!wp-includes/**"
4 changes: 2 additions & 2 deletions .github/workflows/sync-and-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ jobs:
run: |
if [ "${{ github.event.inputs.force }}" = "true" ]
then
bin/from_devto -f
bin/sync_with_devto -f
else
bin/from_devto
bin/sync_with_devto
fi
bin/upload_assets_to_github
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ source "https://rubygems.org"
gem "minitest"
gem "capybara"
gem "launchy"
gem "httparty"
gem "selenium-webdriver"
gem "rack"
gem "rackup"
Expand Down
16 changes: 13 additions & 3 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,18 @@ GEM
concurrent-ruby (1.3.4)
connection_pool (2.4.1)
crass (1.0.6)
csv (3.3.0)
drb (2.2.1)
erubi (1.13.0)
ffi (1.17.0)
ffi (1.17.0-arm64-darwin)
httparty (0.22.0)
csv
mini_mime (>= 1.0.0)
multi_xml (>= 0.5.2)
i18n (1.14.6)
concurrent-ruby (~> 1.0)
json (2.7.6)
json (2.7.5)
language_server-protocol (3.17.0.3)
launchy (3.0.1)
addressable (~> 2.8)
Expand All @@ -79,6 +84,8 @@ GEM
mini_mime (1.1.5)
mini_portile2 (2.8.7)
minitest (5.25.1)
multi_xml (0.7.1)
bigdecimal (~> 3.1)
mutex_m (0.2.0)
nio4r (2.7.4)
nokogiri (1.16.7)
Expand All @@ -99,8 +106,9 @@ GEM
rack (>= 3.0.0)
rack-test (2.1.0)
rack (>= 1.3)
rackup (2.2.0)
rackup (2.1.0)
rack (>= 3)
webrick (~> 1.8)
rails-dom-testing (2.2.0)
activesupport (>= 5.0.0)
minitest
Expand All @@ -121,7 +129,7 @@ GEM
rubocop-ast (>= 1.32.2, < 2.0)
ruby-progressbar (~> 1.7)
unicode-display_width (>= 2.4.0, < 3.0)
rubocop-ast (1.33.1)
rubocop-ast (1.33.0)
parser (>= 3.3.1.0)
rubocop-performance (1.22.1)
rubocop (>= 1.48.1, < 2.0)
Expand Down Expand Up @@ -153,6 +161,7 @@ GEM
useragent (0.16.10)
vips (8.15.1)
ffi (~> 1.12)
webrick (1.9.0)
websocket (1.2.11)
xpath (3.2.0)
nokogiri (~> 1.8)
Expand All @@ -165,6 +174,7 @@ DEPENDENCIES
bigdecimal
capybara
capybara-screenshot-diff!
httparty
launchy
minitest
mutex_m
Expand Down
170 changes: 0 additions & 170 deletions bin/from_devto

This file was deleted.

49 changes: 49 additions & 0 deletions bin/sync/article_cleaner.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
require "fileutils"
require "yaml"

module ArticleCleaner
SYNC_STATUS_FILE = "sync_status.yml".freeze
ARTICLE_FILE = "index.md".freeze

def cleanup_renamed_articles
raise ArgumentError, "Working directory doesn't exist" unless Dir.exist?(working_dir)

deleted_folders = []
slugs = load_slugs_from_yaml

Dir.glob("#{working_dir}/*").each do |folder_path|
next unless File.directory?(folder_path) && File.exist?("#{folder_path}/#{ARTICLE_FILE}")

folder_name = File.basename(folder_path)
unless slugs.include?(folder_name)
begin
FileUtils.rm_rf(folder_path)
deleted_folders << folder_name
puts "Deleted folder: #{folder_name}"
rescue => e
puts "Failed to delete folder #{folder_name}: #{e.message}"
end
end
end
deleted_folders
end

private

def load_slugs_from_yaml
yaml_path = File.join(working_dir, SYNC_STATUS_FILE)

begin
yaml_data = YAML.load_file(yaml_path)
raise "Invalid YAML structure" unless yaml_data.is_a?(Hash)

yaml_data.values.map do |article|
raise "Invalid article data structure" unless article.is_a?(Hash) && article[:slug]
article[:slug]
end
rescue => e
logger.error "Failed to load slugs from YAML: #{e.message}"
[]
end
end
end
59 changes: 59 additions & 0 deletions bin/sync/article_sync_checker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
require "json"

module ArticleSyncChecker
USERNAME = "jetthoughts".freeze
SYNC_STATUS_FILE = "sync_status.yml".freeze
USELESS_WORDS = %w[and the a but to is so].freeze

def update_sync_status
ensure_sync_status_file_exists
@sync_status = sync_status
update_status(fetch_articles)
save_sync_status
end

private

def ensure_sync_status_file_exists
sync_file_path = File.join(working_dir, SYNC_STATUS_FILE)

unless File.exist?(sync_file_path)
File.write(sync_file_path, {}.to_yaml)
end
end

def save_sync_status
File.write(File.join(working_dir, SYNC_STATUS_FILE), @sync_status.to_yaml)
end

def fetch_articles
response = http_client.get_articles(USERNAME, 0)
JSON.parse(response.body)
end

def slug(article)
slug_parts = article["slug"].split("-")[0..-2]
tags = article["tags"] ? article["tags"].split(", ") : []
selected_tags = tags.first(2)
[slug_parts, selected_tags]
.flatten
.uniq
.reject { |segment| USELESS_WORDS.include?(segment) }
.compact
.join("-")
end

def update_status(articles)
articles.each do |article|
id = article["id"]
edited_at = article["edited_at"] || article["created_at"]

@sync_status[id] ||= {edited_at: edited_at, slug: slug(article), synced: false}

if @sync_status[id][:edited_at] != edited_at
@sync_status[id][:edited_at] = edited_at
@sync_status[id][:synced] = false
end
end
end
end
Loading

0 comments on commit d182950

Please sign in to comment.