Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ci issue #52

Merged
merged 11 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.8', '3.9', '3.10', '3.11']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takojunior it seems "windows-latest" is failing installing a (some?) library. That's not sth we can fix easily assume. In this case, I am OK dropping windows testing. That's in line our jump to v2.0 where we drop support for Python 3.7 "and" "windows". May be remove windows and merge? Please add windows change to CHANGELOG as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I see the updates to CHANGELOG. I will update PyPI.

os: [ubuntu-latest, macos-latest]
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} on ${{ matrix.os }}
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@
CHANGELOG
=========

-------------------------------------------------------------------------------
May, 15, 2024 2.0.0
-------------------------------------------------------------------------------

Major:
- Update CI test environment to Python 3.8, 3.9, 3.10, 3.11 and drop support for Python 3.7
- Update installation requirement to Python 3.8+
- Update CI test environment to drop support for Windows-latest tests

Minor:
- New section in README to explain max_span and batch_size parameters for mining large sequence databases.

-------------------------------------------------------------------------------
Apr, 12, 2023 1.4.0
-------------------------------------------------------------------------------
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,23 @@ seq2pat.add_constraint(3 <= price.average() <= 4)
patterns = seq2pat.get_patterns(min_frequency=2)
```

### Mining Large Sequence Databases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new section added

Seq2Pat provides two parameters to mine large-sequence databases efficiently. The Seq2Pat constructor enables `max_span`, the maximum span parameter that controls the columns, i.e., attributes, and `batch_size`, the batch size parameter that controls the rows, i.e., the sequences.

* **Maximum Span:** The span of the pattern can be controlled using the [max_span](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L297) parameter. By default, the span is restricted to ten to avoid performance issues in out-of-the-box performance for general users. Setting `max_span = None` removes this restriction.

* **Batch Size:** The number of sequences in each batch used for pattern mining is controlled by [batch_size](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L303). By default, the batch size is not restricted, meaning the entire data will be used, up to `dynamic_batch_threshold`. If the input dataset size is greater than the [dynamic batch size threshold](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L131), then batching is activated automatically using the [default batch size](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L135). The final set of patterns is the aggregation of patterns over all batches. The `min_frequency` is still enforced whereby a [discount_factor](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L315) is applied to each batch. It is possible that results of mining in batches differ from mining the entire set. The chance of this occurrence is minimized when using a small discount factor. By default, the discount factor is set to 0.2. For further speed-up, batch mining can be parallelized using [n_jobs](https://github.com/fidelity/seq2pat/blob/master/sequential/seq2pat.py#L324) parameter. By default, the number of jobs is set to two.

```python
# Seq2Pat parameters to consider when dealing with large sequence databases
seq2pat = Seq2Pat(sequences=[[], ..large sequence database.., []],
max_span=10,
batch_size=10000,
discount_factor=0.2,
n_jobs=2)
```


### Dichotomic Pattern Mining
```python
# Example to show how to run Dichotomic Pattern Mining
Expand Down Expand Up @@ -102,19 +119,19 @@ Examples on how to use the available constraints can be found
in the [Usage Example Notebook](https://github.com/fidelity/seq2pat/blob/master/notebooks/sequential_pattern_mining.ipynb).
You can also find out how to scale up the mining capability, by running Seq2Pat on batches of sequences in parallel in [Batch Processing Notebook](https://github.com/fidelity/seq2pat/blob/master/notebooks/batch_processing.ipynb).

Supported by Seq2Pat, we proposed **Dichotomic Pattern Mining** ([X. Wang and S. Kadioglu, 2022](https://arxiv.org/abs/2201.09178)) to analyze the correlations between
Supported by Seq2Pat, we proposed **Dichotomic Pattern Mining (DPM)** ([X. Wang and S. Kadioglu, 2022](https://arxiv.org/abs/2201.09178)) to analyze the correlations between
mined patterns and different outcomes of sequences. DPM allows generating feature vectors based on mined patterns and plays an integrator role between Sequential
Pattern Mining and the downstream modeling tasks as shown in [Ghosh et. al., Frontiers'22](https://www.frontiersin.org/articles/10.3389/frai.2022.868085/full) for clickstream intent prediction and intruder detection. An example on how to run DPM and generate pattern embeddings can be found in
[Dichotomic Pattern Mining Notebook](https://github.com/fidelity/seq2pat/blob/master/notebooks/dichotomic_pattern_mining.ipynb).

## Installation

Seq2Pat can be installed from PyPI using ``pip install seq2pat``. It can also be installed from source by following the instructions in
Seq2Pat can be installed from PyPI using ```pip install seq2pat```. It can also be installed from source by following the instructions in
our [documentation](https://fidelity.github.io/seq2pat/installation.html).

### Requirements

The library requires ```Python 3.7+```, the ```Cython``` package, and a ```C++``` compiler.
The library requires **Python 3.8+**, the ```Cython``` package, and a ```C++``` compiler.
See [requirements.txt](requirements.txt) for dependencies.

## Support
Expand Down
20 changes: 13 additions & 7 deletions docs/_modules/index.html
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<html class="writer-html5" lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Overview: module code &mdash; Seq2Pat documentation</title>
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/css/theme.css" />


<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
Expand All @@ -25,11 +27,15 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> Seq2Pat



<a href="../index.html" class="icon icon-home">
Seq2Pat
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
Expand All @@ -55,8 +61,8 @@
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home"></a> &raquo;</li>
<li>Overview: module code</li>
<li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item active">Overview: module code</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
Expand Down
Loading
Loading