Skip to content
View ilyalasy's full-sized avatar
🐍
🐍

Block or report ilyalasy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Web Parsing

Web Page classification, elements classification, boilerplate removal, etc.
19 repositories

Work in progress transmit from Google Code

Java 1,116 288 Updated Jan 3, 2018

Boilerplate Removal using Deep Learning

Python 81 18 Updated Jan 23, 2022

JavaScript object that creates unique CSS selector for given element.

TypeScript 554 93 Updated Mar 11, 2025
Python 1 Updated Jan 31, 2024

Python APTED algorithm for the Tree Edit Distance

Python 91 13 Updated Nov 8, 2017

An efficient approximation for tree edit-distance.

Python 8 1 Updated Feb 9, 2020

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Python 780 147 Updated Jan 12, 2023

SIGIR-2022 Webformer: Pre-training with Web Pages for Information Retrieval

Python 47 11 Updated Sep 20, 2022

WebRED is a large and diverse manually annotated dataset for extracting relationships from a variety of text found on the World Wide Web.

22 4 Updated Mar 11, 2021

Graph-based Deep Q Network for Web Navigation

Python 46 10 Updated Jul 8, 2019

WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making

Python 82 12 Updated Apr 15, 2017

Simplified DOM Trees for Transferable Attribute Extraction from the Web

Python 38 7 Updated Sep 27, 2024

Python package (to be) for converting raw html files to IR vectors either pyhton Dictionaries or NArrays(numpy) - Aims to trasparently handle Encoding and HTML issues

Python 10 Updated Jun 5, 2021

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

Python 13 2 Updated Nov 2, 2020

Formasaurus tells you the type of an HTML form and its fields using machine learning

HTML 118 48 Updated Jun 18, 2024

tNodeEmbed

Python 79 17 Updated Feb 5, 2023

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,883 2,605 Updated Mar 4, 2025
Python 635 56 Updated Feb 28, 2025