diff --git a/LICENSE b/LICENSE
index ab3aa59..7c78c2d 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2022 Benedetto Polimeni
+Copyright (c) 2022-2024 Benedetto Polimeni
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/README.md b/README.md
index 892ad98..c10d0b0 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
 # IRescue - <ins>I</ins>nterspersed <ins>Re</ins>peats <ins>s</ins>ingle-<ins>c</ins>ell q<ins>u</ins>antifi<ins>e</ins>r
 
 <img align="right" height="160" src="docs/logo.png">
-IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to accurately infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.
+IRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data that performs UMI-deduplication with sequencing errors correction and probabilistic assignment of multi-mapping reads by expectation-maximization (EM) procedure. TE counts are written on a sparse matrix (similar to Cell Ranger's output) compatible with Seurat, Scanpy and other toolkits.
 
 ## Content
 
@@ -34,7 +34,7 @@ conda create -n irescue -c conda-forge -c bioconda irescue
 
 ### <a name="pip"></a>Using pip
 
-If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: `python>=3.7`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required, specifically `coreutils>=8.30` and `gzip>=1.10` (older versions are untested).
+If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: `python>=3.7`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required, specifically `gawk>=5.0.1`, `coreutils>=8.30` and `gzip>=1.10` (older versions are untested).
 
 ```bash
 pip install irescue
@@ -57,29 +57,36 @@ singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescu
 
 ## <a name="usage"></a>Usage
 
-### <a name="quick_start"></a>Quick start
+```sh
+irescue --help
+```
+
+The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, `CB` tag for cell barcode and `UR` tag for UMI; override with `--cb-tag` and `--umi-tag`).
 
-The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, `CB` tag for cell barcode and `UR` tag for UMI; override with `--CBtag` and `--UMItag`). You can obtain it by aligning your reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md).
+You can obtain it by aligning your reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md). It is advised to keep secondary alignments in BAM file, that will be used in the EM procedure to assign multi-mapping reads (e.g. `--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100` or more), and remember to output all the needed SAM attributes (e.g. `--outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ`).
 
 RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. `-g hg38`), or provide your own annotation in bed format (e.g. `-r TE.bed`).
 
-```bash
+```sh
 irescue -b genome_alignments.bam -g hg38
 ```
 
-If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: `-w barcodes.tsv`. This will significantly improve performance.
+If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (`-w barcodes.tsv`). This will significantly improve performance by processing viable cells only.
 
-IRescue performs best using at least 4 threads, e.g.: `-p 8`.
+For optimal run time, use at least, e.g.: `-p 8`.
 
 ### <a name="output_files"></a>Output files
 
-IRescue generates TE counts in a sparse matrix format, readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy):
+IRescue generates TE counts in a sparse matrix readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy) into a `counts/` subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats `ec_dump.tsv.gz` and a subdirectory of temporary files `tmp/` for debugging purpose. A detailed logging is enabled by `--verbose` and written to standard error.
 
 ```
-IRescue_out/
-├── barcodes.tsv.gz
-├── features.tsv.gz
-└── matrix.mtx.gz
+irescue_out/
+├── counts/
+│   ├── barcodes.tsv.gz
+│   ├── features.tsv.gz
+│   └── matrix.mtx.gz
+├── ec_dump.tsv.gz
+└── tmp/
 ```
 
 ### <a name="seurat"></a>Load IRescue data with Seurat
@@ -108,8 +115,10 @@ Active assay: RNA (31078 features, 0 variable features)
  1 other assay present: TE
 ```
 
+From here, TE expression can be normalized. Reductions can be made using TE or gene expression.
+
 ## <a name="cite"></a>Cite
 
 Polimeni B, Marasca F, Ranzani V, Bodega B.
-IRescue: single cell uncertainty-aware quantification of transposable elements expression.
+*IRescue: uncertainty-aware quantification of transposable elements expression at single cell level.*
 bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229
diff --git a/irescue/_version.py b/irescue/_version.py
index 4980bee..236fd8c 100644
--- a/irescue/_version.py
+++ b/irescue/_version.py
@@ -1 +1 @@
-__version__ = '1.1.0-beta.1'
+__version__ = '1.1.0-beta.2'
diff --git a/irescue/count.py b/irescue/count.py
index 3a10b0f..6d055e8 100644
--- a/irescue/count.py
+++ b/irescue/count.py
@@ -1,372 +1,308 @@
 #!/usr/bin/env python
 
+from collections import Counter
+from itertools import combinations
 import numpy as np
-from irescue.misc import getlen, writerr, flatten, run_shell_cmd, iupac_nt_code
+import networkx as nx
+from irescue.misc import get_ranges, getlen, writerr, run_shell_cmd
+from irescue.network import build_substr_idx, gen_ec_pairs
+from irescue.em import run_em
 import gzip
 import os
 
-def find_mm(x, y):
+class EquivalenceClass:
+    def __init__(
+            self,
+            index: int,
+            umi: bytes,
+            features: set,
+            count: int
+    ) -> None:
+        self.index = index
+        self.umi = umi
+        self.features = features
+        self.count = count
+    def to_tuple(self):
+        return (self.umi, self.features, self.count)
+    def hdist(self, umi):
+        return sum(1 for i, j in zip(self.umi, umi) if i != j)
+    def connect(self, eqc, threshold):
+        return (self.count >= (2 * eqc.count) - 1
+                and self.features.intersection(eqc.features)
+                and self.hdist(eqc.umi) <= threshold)
+
+def pathfinder(graph, node, path=[], features=None):
     """
-    Calculate number of mismatches between sequences of the same length
+    Finds first valid path of UMIs with compatible equivalence class given
+    a starting node. Can be used iteratively to find all possible paths.
     """
-    if len(x) != len(y):
-        return -1
-    mm = 0
-    for i in range(len(x)):
-        if x[i] != y[i]:
-            mm += 1
-    return mm
+    if not features:
+        features = graph.nodes[node]['ft']
+    path += [node]
+    for next_node in graph.successors(node):
+        if (features.intersection(graph.nodes[next_node]['ft'])
+            and next_node not in path):
+            path = pathfinder(graph, next_node, path, features)
+    return path
+
+def index_features(features_file):
+    idx = {}
+    with gzip.open(features_file, 'rb') as f:
+        for i, line in enumerate(f, start=1):
+            ft = line.strip().split(b'\t')[0]
+            idx[ft] = i
+    return idx
 
-def collapse_networks(graph):
+def parse_maps(maps_file, feature_index):
     """
-    Collapse a UMI graph to a graph of the smallest number of hubs.
-
-    Parameters
-    ----------
-    graph: dict
-        A dictionary with nodes as keys and the set of adjacent nodes
-        (including the node itself) as values.
-        e.g.: {0: {0,1,2}, 1: {0,1}, 2: {0,2,3}}
+    maps_file : str
+        Content: "CB UMI FEATs count"
+    out : bytes, list
+        CB,
+        [(UMI <str>, {FT <int>, ...} <set>, count <int>) <tuple>, ...]
     """
-    out = dict()
-    for key, value in graph.items():
-        out[key] = []
-        if len(value) == 1:
-            # check if it's a single node, then add to the output and go to the next node
-            out[key].append(value)
-            continue
-        for val in graph.values():
-            # check if there are other nodes that contains all the values of the current one
-            if all(i in value for i in val):
-                out[key].append(val)
-        if len(out[key]) <= 1:
-            out.popitem()
-    return out
+    with gzip.open(maps_file, 'rb') as f:
+        cb, umi, feat, count = f.readline().strip().split(b'\t')
+        i = 0
+        it = cb
+        count = int(count)
+        feat = {feature_index[ft] for ft in feat.split(b',')}
+        eqcl = [EquivalenceClass(i, umi, feat, count)]
+        for line in f:
+            cb, umi, feat, count = line.strip().split(b'\t')
+            count = int(count)
+            feat = {feature_index[ft] for ft in feat.split(b',')}
+            if cb == it:
+                i += 1
+                eqcl.append(EquivalenceClass(i, umi, feat, count))
+            else:
+                yield it, eqcl
+                it = cb
+                i = 0
+                eqcl = [EquivalenceClass(i, umi, feat, count)]
+        yield it, eqcl
 
-# calculate counts of a cell from mappings dictionary
-def cellCount(maps, intcount=False, dumpec=False):
+def compute_cell_counts(equivalence_classes, features_index, dumpEC):
     """
-    Deduplicate UMI counts of a cell.
+    Calculate TE counts of a single cell, given a list of equivalence classes.
 
     Parameters
     ----------
-    maps: dict
-        Dictionary of all UMI-TE mappings of the cell.
-        e.g.: {UMI: {TE_1, TE_2}}
-    intcount: bool
-        Convert all counts to integer.
-    dumpec:
-        Make a list of rows for the Equivalence Classes dump (to use with
-        --dumpEC on)
+    equivalence_classes : list
+        (UMI_sequence, {TE_index}, read_count) : (str, set, int)
+        Tuples containing UMI-TEs equivalence class infos.
+
+    Returns
+    -------
+    out : dict
+        feature <int>: count <float> dictionary.
     """
-
-    # get and index equivalence classes from maps
-    eclist = list()
-    for v in maps.values():
-        eclist.append(tuple(sorted(v.keys())))
-    eclist = sorted(list(set(eclist)))
-
-    # make a simple mapping dict (index number in place of families) and its reverse
-    smaps = dict([(i,eclist.index(tuple(sorted(j.keys())))) for i,j in maps.items()])
-    rsmaps = dict()
-    for key, value in smaps.items():
-        rsmaps.setdefault(value, list()).append(key)
-
-    # compute the count of each equivalence class in the cell barcode
-    counts = dict()
-    ec_log = []
-    for ec in rsmaps:
-        # list of UMIs associated to EC
-        umis = rsmaps[ec]
-        ### compute the total count of the equivalence class
-        if len(umis) > 1:
-            ### Find and collapse duplicated UMIs ###
-            # Make an NxN array of number of mismatches between N UMIs
-            mm_arr = np.array([[find_mm(ux,i) for i in umis] for ux in umis])
-            # Find UMI pairs with up to 1 mismatch, where UMIs are representad
-            # by integers: [[i, j], [i, k], [k, m]]
-            mm_check = np.argwhere(mm_arr <= 1)
-            # Make a graph that connects UMIs with <=1 mismatches
-            # {NODE: EDGES} or {UMI: [CONNECTED_UMIS]}
-            graph = dict()
-            for key, value in mm_check:
-                graph.setdefault(key, set()).add(value)
-            # Check if all nodes are connected (i.e. complete graph)
-            if all([x == set(graph.keys()) for x in graph.values()]):
-                # Set EC final count to 1
-                ec_count = 1
-                if dumpec:
-                    mm = [
-                        (i, j) for i, j in enumerate(
-                            [set(x) for x in zip(*umis)]
-                        )
-                        if len(j) > 1
-                    ]
-                    if len(mm) == 1:
-                        mm = mm[0]
-                    if mm:
-                        iupac = iupac_nt_code(mm[1])
-                        umis_dedup = list(umis[0])
-                        umis_dedup[mm[0]] = iupac
-                        umis_dedup = [''.join(umis_dedup)]
-                    else:
-                        umis_dedup = [''.join(umis_dedup)]
-            else:
-                # Collapse networks based on UMI similarity: {HUB: [UMI_GRAPHS]}
-                coll_nets = collapse_networks(graph)
-                # Get EC final count after collapsing
-                ec_count = len(coll_nets)
-                #
-                if dumpec:
-                    umis_dedup = [umis[x] for x in coll_nets]
-
+    # initialize TE counts and dedup log
+    counts = Counter()
+    dump = None
+    number_of_features = len(features_index)
+    # build cell-wide UMI deduplication graph
+    graph = nx.DiGraph()
+    # add nodes with annotated features
+    graph.add_nodes_from(
+        [(x.index, {'ft': x.features, 'c': x.count})
+        for x in equivalence_classes]
+    )
+    # make an iterator of umi pairs
+    if len(equivalence_classes) > 25:
+        umi_length = len(equivalence_classes[0].umi)
+        substr_idx = build_substr_idx(equivalence_classes, umi_length, 1)
+        iter_ec_pairs = gen_ec_pairs(equivalence_classes, substr_idx)
+    else:
+        iter_ec_pairs = combinations(equivalence_classes, 2)
+    for x, y in iter_ec_pairs:
+        # add edges to graph
+        if x.connect(y, 1):
+            graph.add_edge(x.index, y.index)
+        if y.connect(x, 1):
+            graph.add_edge(y.index, x.index)
+    if dumpEC:
+        # collect graph metadata in a dictionary
+        dump = {i: equivalence_classes[i].to_tuple() for i in graph.nodes}
+    # split cell-wide graph into subgraphs of connected nodes
+    subgraphs = [graph.subgraph(x) for x in
+                 nx.connected_components(graph.to_undirected())]
+    # put aside networks that will be solved with EM
+    em_array = []
+    # solve UMI deduplication for each subgraph of connected nodes
+    for subg in subgraphs:
+        # find all parent nodes in graph
+        parents = [x for x in subg if not list(subg.predecessors(x))]
+        if not parents:
+            # if no parents are found due to bidirected edges, take all nodes
+            # and the union of all features (i.e. all nodes are parents).
+            parents = list(subg.nodes)
+            features = [list(set.union(*[subg.nodes[x]['ft'] for x in subg]))]
         else:
-            # If only one umi, skip collapsing and assign 1 to the final count
-            ec_count = 1
-            if dumpec:
-                umis_dedup = umis
-
-        ### find the predominant TE family in the equivalence class
-        # make count matrix from mappings (row = UMI, column = TE)
-        ec_counts = np.array([list(j.values()) for i,j in maps.items() if i in rsmaps[ec]])
-        # sum counts by TE
-        ec_sum = ec_counts.sum(axis = 0)
-        # find the index of the highest count
-        ec_max = np.argwhere(ec_sum == ec_sum.max()).flatten()
-
-        # retrieve the TEs with highest count
-        te_max = list()
-        for i in ec_max:
-            te_max.append(eclist[ec][i])
-
-        # add count
-        for te in te_max:
-            # initialize the feature in the cell barcode dictionary
-            if te not in counts:
-                counts[te] = 0
-            # get the normalized count by dividing the raw count by the number of predominant TEs
-            norm_count = ec_count / len(te_max)
-            # if integers are needed, round the normalized count
-            if intcount:
-                norm_count = round(norm_count)
-            # add count to dictionary
-            counts[te] += norm_count
-        
-        # dump EC
-        if dumpec:
-            if umis == umis_dedup:
-                umis_dedup = ['-']
-            ec_log.append('\t'.join([
-                str(ec),                # EC index
-                ','.join(eclist[ec]),   # EC name
-                ','.join(umis),         # Raw UMIs
-                str(len(umis)),         # Raw count
-                ','.join(umis_dedup),   # Deduplicated UMIs
-                str(ec_count),          # Deduplicated count
-                ','.join(te_max)        # Filtered TEs
-            ]) + '\n')
-
-    return ec_log, counts
-
-def parse_features(features_file):
-    """
-    Parses the features.tsv file, assigns an index (int) for each feature and
-    yields (index, feature) tuples.
-    """
-    with gzip.open(features_file, 'rb') as f:
-        for i, line in enumerate(f):
-            l = line.decode('utf-8').strip().split('\t')
-            yield (l[0], i+1)
-
-def split_int(num, div):
-    """
-    Splits an integer X into N integers whose sum is equal to X.
-    """
-    split = int(num/div)
-    for i in range(0, num, split):
-        j = i + split
-        if j > num-split:
-            j = num
-            yield range(i, j)
-            break
-        yield range(i, j)
-
-def split_bc(barcode_file, n):
-    """
-    Yields barcodes (index,sequence) tuples in n chunks.
-    """
-    bclen = getlen(barcode_file)
-    #split = round(bclen/n)
-    with gzip.open(barcode_file, 'rb') as f:
-        c=0
-        for chunk in split_int(bclen, n):
-            yield (c,[(next(f).decode('utf-8').strip(),x+1) for x in chunk])
-            c+=1
-
-def count(
-        mappings_file, outdir, tmpdir, features, intcount, dumpec, verbose,
-        bc_split
-):
+            # if parents node are found, features will be determined below.
+            features = None
+        # initialize dict of possible paths configurations, starting from
+        # each parent node.
+        paths = {x: [] for x in parents}
+        # find paths starting from each parent node
+        for parent in parents:
+            # populate this list with nodes utilized in paths
+            blacklist = []
+            # find paths in list of nodes starting from parent
+            path = []
+            subg_copy = subg.copy()
+            nodes = [parent] + [x for x in subg_copy if x != parent]
+            for node in nodes:
+                # make a copy of subgraph and remove nodes already used
+                # in a path
+                if node not in blacklist:
+                    path = pathfinder(subg_copy, node, path=[], features=None)
+                    for x in path:
+                        blacklist.append(x)
+                        subg_copy.remove_node(x)
+                    paths[parent].append(path)
+        # find the path configuration leading to the minimum number of
+        # deduplicated UMIs -> list of lists of nodes
+        path_config = [
+            paths[k] for k, v in paths.items()
+            if len(v) == min([len(x) for x in paths.values()])
+        ][0]
+        if not features:
+            # take features from parent node of selected path configuration
+            features = [list(subg.nodes[x[0]]['ft']) for x in path_config]
+        else:
+            # if features was already determined (i.e. no parent nodes),
+            # multiplicate the feature's list by the number of paths
+            # in path_config to avoid going out of list range
+            features *= len(path_config)
+        # assign UMI count to features
+        for feats in features:
+            if len(feats) == 1:
+                counts[feats[0]] += 1.0
+            elif len(feats) > 1:
+                row = [1 if x in feats else 0
+                       for x in range(1, number_of_features+1)]
+                em_array.append(row)
+            else:
+                writerr(nx.to_dict_of_lists(subg))
+                writerr([subg.nodes[x]['ft'] for x in subg.nodes])
+                writerr([subg.nodes[x]['count'] for x in subg.nodes])
+                writerr(path_config)
+                writerr(path)
+                writerr(features)
+                writerr(feats)
+                writerr("Error: no common features detected in subgraph's"
+                        " path.", error=True)
+        # add EC log to dump
+        if dumpEC:
+            for i, path_ in enumerate(path_config):
+                # add empty fields to parent node
+                parent_ = path_[0]
+                path_.pop(0)
+                dump[parent_] += (b'', b'')
+                # if child nodes are present, add parent node informations
+                for x in path_:
+                    # add parent's UMI sequence and dedup features
+                    dump[x] += (dump[parent_][0], features[i])
+    if em_array:
+        # optimize the assignment of UMI from multimapping reads
+        em_array = np.array(em_array)
+        # save an array with features > 0, as in em_array order
+        tokeep = np.argwhere(np.any(em_array[..., :] > 0, axis=0))[:,0] + 1
+        # remove unmapped features from em_array
+        todel = np.argwhere(np.all(em_array[..., :] == 0, axis=0))
+        em_array = np.delete(em_array, todel, axis=1)
+        # run EM
+        em_counts = run_em(em_array, cycles=100)
+        em_counts = [x*em_array.shape[0] for x in em_counts]
+        for i, c in zip(tokeep, em_counts):
+            if c > 0:
+                counts[i] += c
+    return dict(counts), dump
+
+def split_barcodes(barcodes_file, n):
     """
-    Run cellCount() for a set of barcodes.
-
-    Parameters
+    barcode_file : iterable
+    n : int
     ----------
-    mappings_file: str
-        File containing UMI-TE mappings (3-columns text of CB-UMI-TE)
-    outdir: str
-        Output dir to write into.
-    tmpdir: str
-        Directory to write temporary files into.
-    features: list
-        List of (index, feature) tuples, generated with parse_features().
-    intcount: bool
-        Convert all counts to integer.
-    dumpec: bool
-        Write a report of equivalence classes and UMI deduplication.
-    verbose: bool
-        Be verbose.
-    bc_split: list
-        List of barcodes to process, generated with split_bc().
+    out : int, dict
     """
-    '''Runs cellCount for a set of barcodes'''
-    os.makedirs(outdir, exist_ok=True)
-    os.makedirs(tmpdir, exist_ok=True)
-
-    # set temporary matrix name prefix as chunk number
-    chunkn = bc_split[0]
-    matrix_file = os.path.join(tmpdir, f'{chunkn}_matrix.mtx.gz')
-
-    # parse barcodes in a SEQUENCE:INDEX dictionary
-    barcodes = dict(bc_split[1])
-    writerr(
-        f'Processing {len(barcodes)} barcodes from chunk {chunkn}',
-        send=verbose
-    )
-
-    # get number of lines in mappings_file
-    nlines = getlen(mappings_file)
-
-    # initialize mappings dictionary {UMI: {FEATURE: COUNT}}
-    maps = dict()
-
-    # cell barcode placeholder
-    cell = None
-
-    with gzip.open(mappings_file, 'rb') as data, \
-    gzip.open(matrix_file, 'wb') as mtxFile:
-        
-        if dumpec:
-            ec_dump_file = os.path.join(tmpdir, f'{chunkn}_ec_dump.tsv.gz')
-            ecdump = gzip.open(ec_dump_file, 'wb')
-        else:
-            ec_dump_file = None
-
-        for line in enumerate(data, start=1):
-            # gather barcode, umi and feature from mappings file
-            cx, ux, te = line[1].decode('utf-8').strip().split('\t')
-            if '~' in te:
-                te = te[:te.index('~')]
-
-            if len(barcodes)==0:
-                # interrupt loop when reaching the end of the barcodes chunk
-                break
-
-            if not cell:
-                # skip to the first cell barcode contained in the current
-                # barcodes chunk
-                if cx not in barcodes:
-                    continue
-                else:
-                    cell = cx
-
-            # if cell barcode changes, compute counts from previous cell's
-            # mappings
-            if cx != cell and cell in barcodes:
-                cellidx = barcodes.pop(cell)
-                writerr(
-                    f'[{chunkn}] Computing counts for cell barcode {cellidx} '
-                    '({cell})',
-                    send=verbose
-                )
-                # compute final counts of the cell
-                ec_log, counts = cellCount(
-                    maps,
-                    intcount=intcount,
-                    dumpec=dumpec
-                )
-                # arrange counts in a data frame and write to text file
-                lines = [f'{features[k]} {str(cellidx)} {str(v)}\n'.encode() \
-                         for k, v in counts.items()]
-                mtxFile.writelines(lines)
-                if dumpec:
-                    ec_log = [f'{str(cellidx)}\t{cell}\t{x}'.encode() \
-                              for x in ec_log]
-                    ecdump.writelines(ec_log)
-                # re-initialize mappings dict
-                maps = dict()
-            
-            # reassign cell to current barcode
-            cell = cx
-
-            # add features count to mappings dict
-            if cx in barcodes:
-                #teidx = features[te]
-                if ux not in maps:
-                    # initialize UMI if not in mappings dict
-                    maps[ux] = dict()
-                if te in maps[ux]:
-                    # initialize feature count for UMI
-                    maps[ux][te]+=1
-                else:
-                    # add count to existing feature in UMI
-                    maps[ux][te]=1
-
-            # if end of file is reached, compute counts from current cell's
-            # mappings
-            if line[0] == nlines and cell in barcodes:
-                cellidx = barcodes.pop(cell)
-                writerr(
-                    f'[{chunkn}] [file_end] Computing counts for cell '
-                    f'barcode {cellidx} ({cell})',
-                    send=verbose
-                )
-                # compute final counts of the cell
-                ec_log, counts = cellCount(
-                    maps,
-                    intcount=intcount,
-                    dumpec=dumpec
-                )
-                # arrange counts in a data frame and write to text file
-                lines = [f'{features[k]} {str(cellidx)} {str(v)}\n'.encode() \
-                         for k, v in counts.items()]
-                mtxFile.writelines(lines)
-                if dumpec:
-                    ec_log = [f'{str(cellidx)}\t{cell}\t{x}'.encode() \
-                              for x in ec_log]
-                    ecdump.writelines(ec_log)
-        if dumpec:
-            ecdump.close()
+    nBarcodes = getlen(barcodes_file)
+    with gzip.open(barcodes_file, 'rb') as f:
+        for i, chunk in enumerate(get_ranges(nBarcodes, n)):
+            yield i, {next(f).strip(): x+1 for x in chunk}
+
+def run_count(maps_file, features_index, tmpdir, dumpEC, verbose,
+              barcodes_set):
+    # NB: keep args order consistent with main.countFun
+    taskn, barcodes = barcodes_set
+    matrix_file = os.path.join(tmpdir, f'{taskn}_matrix.mtx.gz')
+    dump_file = os.path.join(tmpdir, f'{taskn}_EqCdump.tsv.gz')
+    with (gzip.open(matrix_file, 'wb') as f,
+          gzip.open(dump_file, 'wb') if dumpEC
+          else gzip.open(os.devnull) as df):
+        for cellbarcode, cellmaps in parse_maps(maps_file, features_index):
+            if cellbarcode not in barcodes:
+                continue
+            cellidx = barcodes[cellbarcode]
             writerr(
-                f'Equivalence Classes dump file written to {ec_dump_file}',
+                f'[{taskn}] Run count for cell '
+                f'{cellidx} ({cellbarcode.decode()})',
                 send=verbose
             )
-    writerr(f'Barcodes chunk {chunkn} written to {matrix_file}', send=verbose)
-    return matrix_file, ec_dump_file
-
-# Concatenate matrices in a single MatrixMarket file with proper header
-def formatMM(matrix_files, outdir, features, barcodes):
+            cellcounts, dump = compute_cell_counts(
+                equivalence_classes=cellmaps,
+                features_index=features_index,
+                dumpEC=dumpEC
+            )
+            writerr(
+                f'[{taskn}] Write count for cell '
+                f'{cellidx} ({cellbarcode.decode()})',
+                send=verbose
+            )
+            # round counts to 3rd decimal point and write to matrix file
+            # only if count is at least 0.001
+            lines = [f'{feature} {cellidx} {round(count, 3)}\n'.encode()
+                     for feature, count in cellcounts.items()
+                     if count >= 0.001]
+            f.writelines(lines)
+            if dumpEC:
+                writerr(
+                    f'[{taskn}] Write ECdump for cell '
+                    f'{cellidx} ({cellbarcode.decode()})',
+                    send=verbose
+                )
+                # reverse features index to get names back
+                findex = dict(zip(features_index.values(),
+                                  features_index.keys()))
+                dumplines = [
+                    b'\t'.join(
+                        [str(cellidx).encode(),
+                         cellbarcode,
+                         str(i).encode(),
+                         umi,
+                         b','.join([findex[f] for f in feats]),
+                         str(count).encode(),
+                         pumi,
+                         b','.join([findex[f] for f in pfeats])]
+                    ) + b'\n'
+                    for i, (umi, feats, count, pumi, pfeats) in dump.items()
+                ]
+                df.writelines(dumplines)
+    return matrix_file, dump_file
+
+def formatMM(matrix_files, feature_index, barcodes_chunks, outdir):
     if type(matrix_files) is str:
         matrix_files = [matrix_files]
     matrix_out = os.path.join(outdir, 'matrix.mtx.gz')
-    features_count = len(features)
-    barcodes_count = len(flatten([j for i,j in barcodes]))
+    features_count = len(feature_index)
+    barcodes_count = sum(len(x) for _, x in barcodes_chunks)
     mmsize = sum(getlen(f) for f in matrix_files)
-    mmheader = '%%MatrixMarket matrix coordinate real general\n'
-    mmtotal = f'{features_count} {barcodes_count} {mmsize}\n'
+    mmheader = b'%%MatrixMarket matrix coordinate real general\n'
+    mmtotal = f'{features_count} {barcodes_count} {mmsize}\n'.encode()
     with gzip.GzipFile(matrix_out, 'wb', mtime=0) as mmout:
-        mmout.write(mmheader.encode())
-        mmout.write(mmtotal.encode())
+        mmout.write(mmheader)
+        mmout.write(mmtotal)
     mtxstr = ' '.join(matrix_files)
     cmd = f'zcat {mtxstr} | LC_ALL=C sort -k2,2n -k1,1n | gzip >> {matrix_out}'
     run_shell_cmd(cmd)
@@ -378,18 +314,17 @@ def writeEC(ecdump_files, outdir):
     ecdump_out = os.path.join(outdir, 'ec_dump.tsv.gz')
     ecdumpstr = ' '.join(ecdump_files)
     header = '\t'.join([
-        'BC_index',
+        'Barcode_id',
         'Barcode',
-        'EC_index',
-        'EC_name',
-        'Raw_UMIs',
-        'Raw_count',
-        'Dedup_UMIs',
-        'Dedup_count',
-        'Filtered_TE'
+        'EqClass',
+        'UMI',
+        'Features',
+        'Read_count',
+        'Dedup_UMI',
+        'Dedup_feature'
     ]) + '\n'
     with gzip.GzipFile(ecdump_out, 'wb', mtime=0) as f:
         f.write(header.encode())
-    cmd = f'zcat {ecdumpstr} | LC_ALL=C sort -k1,1n -k2 | gzip >> {ecdump_out}'
+    cmd = f'zcat {ecdumpstr} | LC_ALL=C sort -k1,1n -k3,3n | gzip >> {ecdump_out}'
     run_shell_cmd(cmd)
-    return ecdump_out
+    return ecdump_out
\ No newline at end of file
diff --git a/irescue/em.py b/irescue/em.py
new file mode 100644
index 0000000..39443df
--- /dev/null
+++ b/irescue/em.py
@@ -0,0 +1,49 @@
+import numpy as np
+
+def e_step(matrix, counts):
+    """
+    Performs E-step of EM algorithm: proportionally assigns reads to features
+    based on relative feature abundances.
+    """
+    colsums = (matrix * counts).sum(axis=1)[:, np.newaxis]
+    out = matrix / colsums * counts
+    return(out)
+
+def m_step(matrix):
+    """
+    Performs M-step of EM algorithm: calculates feature abundances from read
+    counts proportionally distributed to features.
+    """
+    counts = matrix.sum(axis=0) / matrix.sum()
+    return(counts)
+
+def run_em(matrix, cycles=100):
+    """
+    Run Expectation-Maximization (EM) algorithm to redistribute read counts
+    across a set of features.
+
+    Parameters
+    ----------
+    matrix : array
+        Reads-features compatibility matrix.
+    cycles : int, optional
+        Number of EM cycles.
+
+    Returns
+    -------
+    out : list
+        Optimized relative feature abundances.
+    """
+    
+    # calculate initial estimation of relative abundance.
+    # (let the sum of counts of features be 1,
+    # will be multiplied by the real UMI count later)
+    nFeatures = matrix.shape[1]
+    counts = np.array([1 / nFeatures] * nFeatures)
+
+    # run EM for n cycles
+    for _ in range(cycles):
+        e_matrix = e_step(matrix=matrix, counts=counts)
+        counts = m_step(matrix=e_matrix)
+
+    return(counts)
\ No newline at end of file
diff --git a/irescue/main.py b/irescue/main.py
index 179353d..13f1089 100644
--- a/irescue/main.py
+++ b/irescue/main.py
@@ -3,11 +3,11 @@
 from irescue._version import __version__
 from irescue._genomes import __genomes__
 from irescue.misc import writerr, versiontuple, run_shell_cmd
-from irescue.misc import check_requirement, check_arguments, check_tags
+from irescue.misc import check_requirement, check_tags
 from irescue.map import makeRmsk, getRefs, prepare_whitelist, isec, chrcat
 from irescue.map import checkIndex
-from irescue.count import split_bc, parse_features, count, formatMM, writeEC
-import argparse, os
+from irescue.count import split_barcodes, index_features, run_count, formatMM, writeEC
+import argparse, os, sys
 from multiprocessing import Pool
 from functools import partial
 from shutil import rmtree
@@ -22,101 +22,90 @@ def parseArguments():
             " in scRNA-seq.",
         epilog="Home page: https://github.com/bodegalab/irescue"
     )
-    parser.add_argument('-b', '--bam',
-                        required=True,
-                        metavar='FILE',
+    parser.add_argument('-b', '--bam', required=True, metavar='FILE',
                         help="scRNA-seq reads aligned to a reference genome "
                         "(required).")
-    parser.add_argument('-r', '--regions',
-                        metavar='FILE',
+    parser.add_argument('-r', '--regions', metavar='FILE',
                         help="Genomic TE coordinates in bed format. "
                         "Takes priority over --genome (default: %(default)s).")
-    parser.add_argument('-g', '--genome',
-                        metavar='STR',
+    parser.add_argument('-g', '--genome', metavar='STR',
+                        choices=__genomes__.keys(),
                         help="Genome assembly symbol. One of: {} (default: "
                         "%(default)s).".format(', '.join(__genomes__)))
-    parser.add_argument('-w', '--whitelist',
-                        metavar='FILE',
+    parser.add_argument('-w', '--whitelist', metavar='FILE',
                         help="Text file of filtered cell barcodes by e.g. "
                         "Cell Ranger, STARSolo or your gene expression "
                         "quantifier of choice (Recommended. "
                         "default: %(default)s).")
-    parser.add_argument('-cb', '--CBtag',
-                        default='CB',
-                        metavar='STR',
+    parser.add_argument('-c', '--cb-tag', default='CB', metavar='STR',
                         help="BAM tag containing the cell barcode sequence "
                         "(default: %(default)s).")
-    parser.add_argument('-umi', '--UMItag',
-                        default='UR',
-                        metavar='STR',
+    parser.add_argument('-u', '--umi-tag', default='UR', metavar='STR',
                         help="BAM tag containing the UMI sequence "
                         "(default: %(default)s).")
-    parser.add_argument('-p', '--threads',
-                        type=int,
-                        default=1,
-                        metavar='CPUS <int>',
+    parser.add_argument('-p', '--threads', type=int, default=1, metavar='CPUS',
                         help="Number of cpus to use (default: %(default)s).")
-    parser.add_argument('-o', '--outdir',
-                        default='IRescue_out',
-                        metavar='DIR',
+    parser.add_argument('-o', '--outdir', default='irescue_out', metavar='DIR',
                         help="Output directory name (default: %(default)s).")
-    parser.add_argument('--min-bp-overlap',
-                        type=int,
-                        metavar='INT',
+    parser.add_argument('--min-bp-overlap', type=int, metavar='INT',
                         help="Minimum overlap between read and TE as number "
                         "of nucleotides (Default: disabled).")
-    parser.add_argument('--min-fraction-overlap',
-                        type=float,
-                        metavar='FLOAT',
+    parser.add_argument('--min-fraction-overlap', type=float, metavar='FLOAT',
+                        choices=[x/100 for x in range(101)],
                         help="Minimum overlap between read and TE"
                         " as a fraction of read's alignment"
                         " (i.e. 0.00 <= NUM <= 1.00) (Default: disabled).")
-    parser.add_argument('--dumpEC',
-                        action='store_true',
+    parser.add_argument('--dump-ec', action='store_true',
                         help="Write a description log file of Equivalence "
                         "Classes.")
-    parser.add_argument('--integers',
-                        action='store_true',
+    parser.add_argument('--integers', action='store_true',
                         help="Use if integers count are needed for "
                         "downstream analysis.")
-    parser.add_argument('--samtools',
-                        default='samtools',
-                        metavar='PATH',
+    parser.add_argument('--samtools', default='samtools', metavar='PATH',
                         help="Path to samtools binary, in case it's not in "
                         "PATH (Default: %(default)s).")
-    parser.add_argument('--bedtools',
-                        default='bedtools',
-                        metavar='PATH',
+    parser.add_argument('--bedtools', default='bedtools', metavar='PATH',
                         help="Path to bedtools binary, in case it's not in "
                         "PATH (Default: %(default)s).")
-    parser.add_argument('--no-tags-check',
-                        action='store_true',
+    parser.add_argument('--no-tags-check', action='store_true',
                         help="Suppress checking for CBtag and UMItag "
                         "presence in bam file.")
-    parser.add_argument('--keeptmp',
-                        action='store_true',
-                        help="Keep temporary files.")
-    parser.add_argument('--tmpdir',
-                        default='IRescue_tmp',
-                        metavar='DIR',
-                        help="Directory to store temporary files "
-                        "(default: %(default)s).")
-    parser.add_argument('-v', '--verbose',
-                        action='store_true',
+    parser.add_argument('--keeptmp', action='store_true',
+                        help="Keep temporary files under <output_dir>/tmp.")
+    #parser.add_argument('--tmpdir', default='irescue_out/tmp', metavar='DIR',
+    #                    help="Directory to store temporary files "
+    #                    "(default: %(default)s).")
+    parser.add_argument('-v', '--verbose', action='store_true',
                         help="Writes a lot of stuff to stderr, such as "
                         "chromosomes as they are mapped and cell barcodes "
                         "as they are processed.")
-    parser.add_argument('-V', '--version',
-                        action='version',
+    parser.add_argument('-V', '--version', action='version',
                         version='%(prog)s {}'.format(__version__),
                         help="Print software's version and exit.")
     return parser
 
 
 def main():
+
+    # Parse and print arguments
     parser = parseArguments()
-    args = parser.parse_args()
-    args = check_arguments(args)
+    args = parser.parse_args(args=None if sys.argv[1:] else ['--help'])
+    argstr = '\n'.join(f'    {k}: {v}' for k, v in args.__dict__.items())
+    sys.stderr.write(f"    IRescue version {__version__}\n{argstr}\n")
+
+    #__tmpdir__ = os.path.join(args.outdir, 'tmp')
+    dirs = {
+        'out': args.outdir,
+        'tmp': os.path.join(args.outdir, 'tmp'),
+        'mex': os.path.join(args.outdir, 'counts')
+    }
+
+
+    ####################
+    # Preliminar steps #
+    ####################
+
+    writerr("Running preliminary checks.")
 
     # Check requirements
     check_requirement(
@@ -134,28 +123,33 @@ def main():
 
     # Check if the selected cell barcode and UMI tags are present in bam file.
     if not args.no_tags_check:
-        check_tags(bamFile=args.bam, CBtag=args.CBtag, UMItag=args.UMItag,
+        check_tags(bamFile=args.bam, CBtag=args.cb_tag, UMItag=args.umi_tag,
                    nLines=999999, exit_with_error=True, verbose=args.verbose)
 
     # Check for bam index file. If not present, will build an index.
     checkIndex(args.bam, verbose=args.verbose)
-    
-    writerr('IRescue job starts')
-    
+
     # create directories
-    os.makedirs(args.tmpdir, exist_ok=True)
-    os.makedirs(args.outdir, exist_ok=True)
+    for v in dirs.values():
+        os.makedirs(v, exist_ok=True)
+
+
+    ###########
+    # Mapping #
+    ###########
+
+    writerr("Running mapping step.")
 
     # set regions object (provided or downloaded bed file)
     regions = makeRmsk(regions=args.regions, genome=args.genome,
-                       genomes=__genomes__, tmpdir=args.tmpdir,
+                       genomes=__genomes__, tmpdir=dirs['tmp'],
                        outname='rmsk.bed')
 
     # get list of reference names from bam
     chrNames = getRefs(args.bam, regions)
 
     # decompress whitelist if compressed
-    whitelist = prepare_whitelist(args.whitelist, args.tmpdir)
+    whitelist = prepare_whitelist(args.whitelist, dirs['tmp'])
 
     # Allocate threads
     if args.threads > 1:
@@ -168,8 +162,8 @@ def main():
         send=args.verbose
     )
     isecFun = partial(
-        isec, args.bam, regions, whitelist, args.CBtag, args.UMItag,
-        args.min_bp_overlap, args.min_fraction_overlap, args.tmpdir,
+        isec, args.bam, regions, whitelist, args.cb_tag, args.umi_tag,
+        args.min_bp_overlap, args.min_fraction_overlap, dirs['tmp'],
         args.samtools, args.bedtools, args.verbose
     )
     if args.threads > 1:
@@ -179,20 +173,27 @@ def main():
 
     # concatenate intersection results
     mappings_file, barcodes_file, features_file = chrcat(
-        isecFiles, threads=args.threads, outdir=args.outdir,
-        tmpdir=args.tmpdir, verbose=args.verbose
+        isecFiles, threads=args.threads, outdir=dirs['mex'],
+        tmpdir=dirs['tmp'], bedtools=args.bedtools, verbose=args.verbose
     )
 
+
+    #########
+    # Count #
+    #########
+
+    writerr("Running count step.")
+
     # calculate number of mappings per process
-    bc_per_thread = list(split_bc(barcodes_file, args.threads))
+    bc_per_thread = list(split_barcodes(barcodes_file, args.threads))
 
     # parse features
-    ftlist = dict(parse_features(features_file))
+    feature_index = index_features(features_file)
 
     # calculate TE counts
     countFun = partial(
-        count, mappings_file, args.outdir, args.tmpdir, ftlist, args.integers,
-        args.dumpEC, args.verbose
+        run_count, mappings_file, feature_index, dirs['tmp'],
+        args.dump_ec, args.verbose
     )
     if args.threads > 1:
         mtxFiles = pool.map(countFun, bc_per_thread)
@@ -208,16 +209,15 @@ def main():
     matrix_files = [ i for i, j in mtxFiles]
     ecdump_files = [ j for i, j in mtxFiles]
     matrix_file = formatMM(
-        matrix_files, outdir=args.outdir, features=ftlist,
-        barcodes=bc_per_thread
+        matrix_files, feature_index, bc_per_thread, dirs['mex']
     )
     writerr(f'Writing sparse matrix to {matrix_file}')
-    if args.dumpEC:
-        ecdump_file = writeEC(ecdump_files, outdir=args.outdir)
+    if args.dump_ec:
+        ecdump_file = writeEC(ecdump_files, outdir=dirs['out'])
         writerr(f'Writing Equivalence Classes to {ecdump_file}')
 
     if not args.keeptmp:
         writerr(f'Cleaning up temporary files.', send=args.verbose)
-        rmtree(args.tmpdir)
+        rmtree(dirs['tmp'])
 
     writerr('Done.')
diff --git a/irescue/map.py b/irescue/map.py
index f82f9ea..9777af9 100644
--- a/irescue/map.py
+++ b/irescue/map.py
@@ -57,12 +57,6 @@ def makeRmsk(regions, genome, genomes, tmpdir, outname):
     # if no repeatmasker file is provided, and a genome assembly name is
     # provided, download and prepare a rmsk.bed file
     elif genome:
-        if not genome in genomes:
-            writerr(
-                "ERROR: Genome assembly name shouldbe one of: "
-                f"{', '.join(genomes.keys())}",
-                error=True
-            )
         url, header_lines = genomes[genome]
         writerr(
             "Downloading and parsing RepeatMasker annotation for "
@@ -101,7 +95,7 @@ def makeRmsk(regions, genome, genomes, tmpdir, outname):
                 if famclass.split('/')[0] in fams_to_skip:
                     continue
                 # concatenate family and class with subfamily
-                subfamily += '~' + famclass
+                subfamily += '#' + famclass
                 score = lst[0]
                 chr, start, end = lst[4:7]
                 # make coordinates 0-based
@@ -172,7 +166,7 @@ def isec(bamFile, bedFile, whitelist, CBtag, UMItag, bpOverlap, fracOverlap,
     os.makedirs(isecdir, exist_ok=True)
 
     refFile = os.path.join(refdir, chrom + '.bed.gz')
-    isecFile = os.path.join(isecdir, chrom + '.isec.bed.gz')
+    isecFile = os.path.join(isecdir, chrom + '.isec.txt.gz')
 
     # split bed file by chromosome
     sort = 'LC_ALL=C sort -k1,1 -k2,2n --buffer-size=1G'
@@ -210,8 +204,8 @@ def isec(bamFile, bedFile, whitelist, CBtag, UMItag, bpOverlap, fracOverlap,
     # remove mate information from read name
     cmd += ' { sub(/\/[12]$/,"",$4); '
     # concatenate CB and UMI with feature name
-    cmd += ' n=split($4,qname,/\//); $4=qname[n-1]"\\t"qname[n]"\\t"$16; '
-    cmd += ' print $4 }\' '
+    cmd += ' n=split($4,qname,/\//); '
+    cmd += ' print qname[n-1]"\\t"qname[n]"\\t"qname[1]"\\t"$16 }\' '
     cmd += f' | gzip > {isecFile}'
 
     writerr(f'Extracting {chrom} reference', send=verbose)
@@ -223,24 +217,34 @@ def isec(bamFile, bedFile, whitelist, CBtag, UMItag, bpOverlap, fracOverlap,
     return isecFile
 
 # Concatenate and sort data obtained from isec()
-def chrcat(filesList, threads, outdir, tmpdir, verbose):
+def chrcat(filesList, threads, outdir, tmpdir, bedtools, verbose):
     os.makedirs(outdir, exist_ok=True)
-    mappings_file = os.path.join(tmpdir, 'cb_umi_te.bed.gz')
+    mappings_file = os.path.join(tmpdir, 'mappings.tsv.gz')
     barcodes_file = os.path.join(outdir, 'barcodes.tsv.gz')
     features_file = os.path.join(outdir, 'features.tsv.gz')
     bedFiles = ' '.join(filesList)
-    cmd0 = f'zcat {bedFiles} '
-    cmd0 += f' | LC_ALL=C sort --parallel {threads} --buffer-size 2G '
-    cmd0 += f' | gzip > {mappings_file} '
+    sort_threads = int(threads / 2 - 1)
+    sort_threads = sort_threads if sort_threads>0 else 1
+
+    # sort and summarize UMI-READ-TE mappings
+    sort_res = f'--parallel {sort_threads} --buffer-size 2G'
+    cmd0 = f'zcat {bedFiles}'
+        # input: "CB UMI READ FEAT"
+    cmd0 += f' | LC_ALL=C sort -u {sort_res}'
+    cmd0 += f' | {bedtools} groupby -g 1,2,3 -c 4 -o distinct'
+        # result: "CB UMI READ FEATs"
+    cmd0 += f' | LC_ALL=C sort -k1,2 -k4,4 {sort_res}'
+    cmd0 += f' | {bedtools} groupby -g 1,2,4 -c 3 -o count_distinct'
+        # result: "CB UMI FEATs count"
+    cmd0 += f' | gzip > {mappings_file}'
+
+    # write barcodes.tsv.gz file
     cmd1 = f'zcat {mappings_file} | cut -f1 | uniq | gzip > {barcodes_file} '
+
+    # write features.tsv.gz file
     cmd2 = f'zcat {mappings_file} '
-    cmd2 += ' | gawk \'!x[$3]++ { '
-    cmd2 += ' split($3,a,"~"); '
-    # avoid subfamilies with the same name
-    cmd2 += ' if(a[1] in sf) { sf[a[1]]+=1 } else { sf[a[1]] }; '
-    cmd2 += ' if(length(a)<2) { a[2]=a[1] }; '
-    cmd2 += ' print a[1] sf[a[1]] "\\t" a[2] "\\tGene Expression" '
-    cmd2 += ' }\' '
+    cmd2 += ' | cut -f3 | sed \'s/,/\\n/g\' | gawk \'!x[$1]++ { '
+    cmd2 += ' print $1"\\t"gensub(/#.+/,"",1,$1)"\\tGene Expression" }\' '
     cmd2 += f' | LC_ALL=C sort -u | gzip > {features_file} '
 
     writerr('Concatenating mappings', send=verbose)
diff --git a/irescue/misc.py b/irescue/misc.py
index 7a49285..aea6b03 100644
--- a/irescue/misc.py
+++ b/irescue/misc.py
@@ -37,18 +37,6 @@ def versiontuple(version):
     """
     return tuple(map(int, version.split('.')))
 
-def check_arguments(args):
-    """
-    Check validity of arguments.
-    """
-    if isinstance(args.min_fraction_overlap, (int, float)):
-        if 0 <= args.min_fraction_overlap <= 1:
-            pass
-        else:
-            writerr("ERROR: --min-fraction-overlap must be a floating point "
-                    "number between 0 and 1.", error=True)
-    return args
-
 def check_requirement(cmd, required_version, parser, verbose):
     """
     Check if the required version for a software has been installed.
@@ -94,7 +82,7 @@ def writerr(msg, error=False, send=True):
           Decides if the message should be sent (useful for verbose messages).
     """
     if send:
-        timelog = datetime.now().strftime("%m/%d/%Y - %H:%M:%S")
+        timelog = datetime.now().strftime("%Y/%m/%d - %H:%M:%S")
         message = f'[{timelog}] '
         if not msg[-1]=='\n':
             msg += '\n'
@@ -143,12 +131,6 @@ def getlen(file):
     f.close()
     return out
 
-def flatten(x):
-    """
-    Flatten a list of sublists.
-    """
-    return [item for sublist in x for item in sublist]
-
 def check_tags(
         bamFile, CBtag, UMItag,
         nLines=None, exit_with_error=True, verbose=False
@@ -216,22 +198,15 @@ def check_tags(
     else:
         return(False)
 
-def iupac_nt_code(nts):
-    """
-    Return the IUPAC code correspondent to a set of input nucleotides.
-    """
-    codes = {
-        'R': {'A', 'G'},
-        'Y': {'C', 'T'},
-        'S': {'G', 'C'},
-        'W': {'A', 'T'},
-        'K': {'G', 'T'},
-        'M': {'A', 'C'},
-        'B': {'C', 'G', 'T'},
-        'D': {'A', 'G', 'T'},
-        'H': {'A', 'C', 'T'},
-        'V': {'A', 'C', 'G'},
-        'N': {'A', 'C', 'G', 'T'}
-    }
-    out = [k for k, v in codes.items() if v == set(nts)][0]
-    return out
+def get_ranges(num, div):
+    """
+    Splits an integer X into N integers whose sum is equal to X.
+    """
+    split = int(num/div)
+    for i in range(0, num, split):
+        j = i + split
+        if j > num-split:
+            j = num
+            yield range(i, j)
+            break
+        yield range(i, j)
diff --git a/irescue/network.py b/irescue/network.py
new file mode 100644
index 0000000..f274e3a
--- /dev/null
+++ b/irescue/network.py
@@ -0,0 +1,69 @@
+#!/usr/bin/env python
+
+# NB: This module include partly modified third-party code distributed under the
+# license below.
+
+##############################################################################
+# The MIT License (MIT)
+
+# Copyright (c) 2015 CGAT
+
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+##############################################################################
+
+from collections import defaultdict
+
+def get_substr_slices(umi_length, idx_size):
+    '''
+    Create slices to split a UMI into approximately equal size substrings
+    Returns a list of tuples that can be passed to slice function
+    '''
+    cs, r = divmod(umi_length, idx_size)
+    sub_sizes = [cs + 1] * r + [cs] * (idx_size - r)
+    offset = 0
+    slices = []
+    for s in sub_sizes:
+        slices.append((offset, offset + s))
+        offset += s
+    return slices
+
+def build_substr_idx(equivalence_classes, length, threshold):
+    '''
+    Group equivalence classes into subgroups having a common substring
+    '''
+    slices = get_substr_slices(length, threshold+1)
+    substr_idx = {k: defaultdict(set) for k in slices}
+    for idx in slices:
+        for ec in equivalence_classes:
+            sub = ec.umi[slice(*idx)]
+            substr_idx[idx][sub].add(ec)
+    return substr_idx
+
+def gen_ec_pairs(equivalence_classes, substr_idx):
+    '''
+    Yields equivalence classes pairs from build_substr_idx()
+    '''
+    for i, ec in enumerate(equivalence_classes, start=1):
+        neighbours = set()
+        for idx, substr_map in substr_idx.items():
+            sub = ec.umi[slice(*idx)]
+            neighbours = neighbours.union(substr_map[sub])
+        neighbours.difference_update(equivalence_classes[:i])
+        for nbr in neighbours:
+            yield ec, nbr
\ No newline at end of file
diff --git a/pyproject.toml b/pyproject.toml
index 123ca38..9f2a6c8 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -32,6 +32,7 @@ dependencies = [
     "numpy >= 1.20.2",
     "pysam >= 0.16.0.1",
     "requests >= 2.27.1",
+    "networkx >= 3.1",
 ]
 dynamic = ["version"]
 
diff --git a/tests/data/rmsk.bed.gz b/tests/data/rmsk.bed.gz
index d032937..422df4a 100644
Binary files a/tests/data/rmsk.bed.gz and b/tests/data/rmsk.bed.gz differ
diff --git a/tests/test.yml b/tests/test.yml
index 291c46d..4e55120 100644
--- a/tests/test.yml
+++ b/tests/test.yml
@@ -1,48 +1,48 @@
 - name: base
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 1a74fa12e65ac1703bbe61282854f151
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: e8bf21611afd1f40d722ed985f4e3392
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 04ddbd538c796f019f37d3048b159a2f
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: ae84bc368a289e070b754030a65d69b4
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: ca147b42af250be7c47c4a748693ca97
   
 - name: genome
   tags:
     - genome
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -g test --keeptmp -v
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 1a74fa12e65ac1703bbe61282854f151
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: e8bf21611afd1f40d722ed985f4e3392
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 04ddbd538c796f019f37d3048b159a2f
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: ae84bc368a289e070b754030a65d69b4
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: ca147b42af250be7c47c4a748693ca97
 
 - name: multi
   tags:
     - multi
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz -p 2 --keeptmp -v
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 1a74fa12e65ac1703bbe61282854f151
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: e8bf21611afd1f40d722ed985f4e3392
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 04ddbd538c796f019f37d3048b159a2f
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: ae84bc368a289e070b754030a65d69b4
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: ca147b42af250be7c47c4a748693ca97
 
 - name: whitelist
   tags:
     - whitelist
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz -w ./tests/data/whitelist.txt --keeptmp -v
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 95dccc15cbee4feeeae2fbce4d7b41ad
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: 2dcec6f4aead5faba9c1af44b0129b55
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 85c61d1df6ccadf83eafc6bc36a21c89
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: 65fb8381a658a4eb4e5d0a575c67818d
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: d4f60bc056ea189c7473a3624f3c2970
 
 - name: multi whitelist
   tags:
@@ -50,65 +50,65 @@
     - whitelist
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz -w ./tests/data/whitelist.txt --keeptmp -v -p 2
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 95dccc15cbee4feeeae2fbce4d7b41ad
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: 2dcec6f4aead5faba9c1af44b0129b55
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 85c61d1df6ccadf83eafc6bc36a21c89
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: 65fb8381a658a4eb4e5d0a575c67818d
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: d4f60bc056ea189c7473a3624f3c2970
 
 - name: ecdump
   tags:
     - ecdump
-  command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v --dumpEC
+  command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v --dump-ec
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 1a74fa12e65ac1703bbe61282854f151
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: e8bf21611afd1f40d722ed985f4e3392
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 04ddbd538c796f019f37d3048b159a2f
-    - path: "IRescue_out/ec_dump.tsv.gz"
-      md5sum: 2fbcb954fb48065c6b67a84001b6bc34
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: ae84bc368a289e070b754030a65d69b4
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: ca147b42af250be7c47c4a748693ca97
+    - path: "irescue_out/ec_dump.tsv.gz"
+      md5sum: d71ee82b25107d4e104d313efb4be134
 
 - name: multi ecdump
   tags:
     - multi
     - ecdump
-  command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v -p 2 --dumpEC
+  command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v -p 2 --dump-ec
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 1a74fa12e65ac1703bbe61282854f151
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: e8bf21611afd1f40d722ed985f4e3392
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 04ddbd538c796f019f37d3048b159a2f
-    - path: "IRescue_out/ec_dump.tsv.gz"
-      md5sum: 2fbcb954fb48065c6b67a84001b6bc34
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: ae84bc368a289e070b754030a65d69b4
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: ca147b42af250be7c47c4a748693ca97
+    - path: "irescue_out/ec_dump.tsv.gz"
+      md5sum: d71ee82b25107d4e104d313efb4be134
 
 - name: bp
   tags:
     - bp
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v --min-bp-overlap 10
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 7433e88e94aec2f16a20459275188f1f
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: 12ff16aee1a5e9847ed96534b3764d13
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 39b3ee6dbffd61a68569b3b30dcaf972
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: 434ff68c92d1b8dd718269a1cd974f99
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: 30fe31ed8976bd002d86bcd956d25855
 
 - name: fraction
   tags:
     - fraction
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v --min-fraction-overlap 0.5
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 4de44d3e4a851392a48ccabfee5bb6fc
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: 6ee6ded0563e8e138fb1d5c958cedeee
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 345b8aff9c00f607ea5a305bed569653
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: 927d00f20e4e65b8d46e761d406b69ff
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: 85b8e8ae7696ff12ffa7e5ac86600fa1
 
 - name: bp fraction
   tags:
@@ -116,9 +116,9 @@
     - fraction
   command: irescue -b ./tests/data/Aligned.sortedByCoord.out.bam -r ./tests/data/rmsk.bed.gz --keeptmp -v --min-bp-overlap 10 --min-fraction-overlap 0.5
   files:
-    - path: "IRescue_out/barcodes.tsv.gz"
+    - path: "irescue_out/counts/barcodes.tsv.gz"
       md5sum: 4de44d3e4a851392a48ccabfee5bb6fc
-    - path: "IRescue_out/features.tsv.gz"
-      md5sum: c95db95604d1731d2908f08eeaf8ded1
-    - path: "IRescue_out/matrix.mtx.gz"
-      md5sum: 40fd2a331d7328d4a4a5428307f8adaf
+    - path: "irescue_out/counts/features.tsv.gz"
+      md5sum: f304e63657f73eeec0edffed68490b6c
+    - path: "irescue_out/counts/matrix.mtx.gz"
+      md5sum: 336e5a5edfad998bc7d64cf0e68cc897