Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: jira connector (cloud) #1238

Merged
merged 58 commits into from
Sep 6, 2023
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
9578ead
add the first working version of jira connector
ahmetmeleq Aug 28, 2023
a8177d3
ingest issue data from a given list of jira components (projects, boa…
ahmetmeleq Aug 28, 2023
628860f
remove support on getting issues via epic_ids
ahmetmeleq Aug 29, 2023
5e7934f
extend scroll decorator to be able to ingest issues via board ids
ahmetmeleq Aug 29, 2023
682990b
add support for ingesting via issue ids, add issue ids ingest example…
ahmetmeleq Aug 29, 2023
466d2df
add ingest test and code on dependencies
ahmetmeleq Aug 30, 2023
b144be0
add workflow credentials
ahmetmeleq Aug 30, 2023
c084b2d
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Aug 30, 2023
4f340ee
changelog and version
ahmetmeleq Aug 30, 2023
77149fc
decomment box connector ingest test (temporarily)
ahmetmeleq Aug 30, 2023
436098f
re-enable box ingest tests
ahmetmeleq Aug 30, 2023
10caab3
rename secrets
ahmetmeleq Aug 30, 2023
084b7b8
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Aug 30, 2023
ddf5b5c
remove comment, remove jql arguments
ahmetmeleq Aug 31, 2023
25de658
Merge branch 'ahmet/ingest-jira-connector' of https://github.com/Unst…
ahmetmeleq Aug 31, 2023
0b74eed
clean examples
ahmetmeleq Aug 31, 2023
b6da3cd
update example
ahmetmeleq Aug 31, 2023
68d70be
help messages for arguments
ahmetmeleq Aug 31, 2023
91c7f1c
create rst documentation, fix small typos in other rst docs
ahmetmeleq Aug 31, 2023
29389ca
initial work on metadata fields
ahmetmeleq Aug 31, 2023
048b44f
add session handles
ahmetmeleq Sep 1, 2023
7eedbc2
add metadata properties
ahmetmeleq Sep 1, 2023
d6e65d3
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Sep 1, 2023
2d7a76d
update secret name
ahmetmeleq Sep 1, 2023
b0ecc25
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Sep 1, 2023
fdd4f84
add registry name and extras parameter for requires_dependencies
ahmetmeleq Sep 1, 2023
a34318e
change confluence extras name
ahmetmeleq Sep 1, 2023
d20b77e
implement exceptions for session handle
ahmetmeleq Sep 1, 2023
3fba0f1
feat: jira connector (cloud) <- Ingest test fixtures update (#1267)
ryannikolaidis Sep 1, 2023
b75a481
fix type checking issue, update record locator
ahmetmeleq Sep 1, 2023
61d7ecd
Merge branch 'ahmet/ingest-jira-connector' of https://github.com/Unst…
ahmetmeleq Sep 1, 2023
49346d3
update ingest test comments
ahmetmeleq Sep 1, 2023
00d2be8
comment --metadata-exclude
ahmetmeleq Sep 1, 2023
f887948
shellcheck fix
ahmetmeleq Sep 1, 2023
cedcf82
disable metadata fields for debugging
ahmetmeleq Sep 1, 2023
dadf14b
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Sep 1, 2023
0a29b18
uncomment --metadata-exclude
ahmetmeleq Sep 2, 2023
ca3f392
fix metadata properties
ahmetmeleq Sep 2, 2023
21b45c5
feat: jira connector (cloud) <- Ingest test fixtures update (#1289)
ryannikolaidis Sep 2, 2023
4dfe8c1
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Sep 4, 2023
fc5aa6f
Merge branch 'main' into ahmet/ingest-jira-connector
ahmetmeleq Sep 4, 2023
b277302
remove ingest_all_issues variable and calculate on the fly
ahmetmeleq Sep 5, 2023
a696ef0
add source_url property, fix bug on session handles
ahmetmeleq Sep 5, 2023
2753fb9
make tidy
ahmetmeleq Sep 5, 2023
65efc0f
version
ahmetmeleq Sep 5, 2023
4a6428e
feat: jira connector (cloud) <- Ingest test fixtures update (#1301)
ryannikolaidis Sep 5, 2023
9c5edd4
refactor metadata properties
ahmetmeleq Sep 5, 2023
167f62e
refactor error handling for get_file
ahmetmeleq Sep 5, 2023
d246049
make exists property independent of other methods
ahmetmeleq Sep 5, 2023
ce1a419
change grouping_folder_name into a property, fix typo
ahmetmeleq Sep 5, 2023
385b5c9
add properties for issue data and parsed issue fields, utilize cache …
ahmetmeleq Sep 5, 2023
1cea0e7
utilize cache to create self.document
ahmetmeleq Sep 5, 2023
9bf2990
rename get_metadata_fields to metadata_fields
ahmetmeleq Sep 5, 2023
4224f03
include exists property within metadata fields rather than standalone
ahmetmeleq Sep 5, 2023
0d50ec2
ditch cached property for calculation within exists
ahmetmeleq Sep 5, 2023
7f1af86
add base_url to record_locator
ahmetmeleq Sep 5, 2023
6e1fd88
re-add cached property for calculation within exists
ahmetmeleq Sep 5, 2023
d3b1367
feat: jira connector (cloud) <- Ingest test fixtures update (#1306)
ryannikolaidis Sep 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,8 @@ jobs:
DROPBOX_REFRESH_TOKEN: ${{ secrets.DROPBOX_REFRESH_TOKEN }}
GCP_INGEST_SERVICE_KEY: ${{ secrets.GCP_INGEST_SERVICE_KEY }}
GH_READ_ONLY_ACCESS_TOKEN: ${{ secrets.GH_READ_ONLY_ACCESS_TOKEN }}
JIRA_INGEST_API_TOKEN: ${{ secrets.JIRA_INGEST_API_TOKEN }}
JIRA_INGEST_USER_EMAIL: ${{ secrets.JIRA_INGEST_USER_EMAIL }}
MS_CLIENT_CRED: ${{ secrets.MS_CLIENT_CRED }}
MS_CLIENT_ID: ${{ secrets.MS_CLIENT_ID }}
MS_TENANT_ID: ${{ secrets.MS_TENANT_ID }}
Expand Down Expand Up @@ -314,6 +316,7 @@ jobs:
make install-ingest-google-drive
make install-ingest-github
make install-ingest-gitlab
make install-ingest-jira
make install-ingest-onedrive
make install-ingest-outlook
make install-ingest-salesforce
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/ingest-test-fixtures-update-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ jobs:
DROPBOX_REFRESH_TOKEN: ${{ secrets.DROPBOX_REFRESH_TOKEN }}
GCP_INGEST_SERVICE_KEY: ${{ secrets.GCP_INGEST_SERVICE_KEY }}
GH_READ_ONLY_ACCESS_TOKEN: ${{ secrets.GH_READ_ONLY_ACCESS_TOKEN }}
JIRA_INGEST_API_TOKEN: ${{ secrets.JIRA_INGEST_API_TOKEN }}
JIRA_INGEST_USER_EMAIL: ${{ secrets.JIRA_INGEST_USER_EMAIL }}
MS_CLIENT_CRED: ${{ secrets.MS_CLIENT_CRED }}
MS_CLIENT_ID: ${{ secrets.MS_CLIENT_ID }}
MS_TENANT_ID: ${{ secrets.MS_TENANT_ID }}
Expand Down Expand Up @@ -105,6 +107,7 @@ jobs:
make install-ingest-google-drive
make install-ingest-github
make install-ingest-gitlab
make install-ingest-jira
make install-ingest-onedrive
make install-ingest-outlook
make install-ingest-salesforce
Expand Down
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
## 0.10.13-dev1
## 0.10.13-dev2

### Enhancements

* Updated documentation: Added back support doc types for partitioning, more Python codes in the API page, RAG definition, and use case.

### Features

* Add Jira Connector to be able to pull issues from a Jira organization

### Fixes

* Ingest error handling to properly raise errors when wrapped


## 0.10.12

### Enhancements
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,10 @@ install-ingest-notion:
install-ingest-salesforce:
python3 -m pip install -r requirements/ingest-salesforce.txt

.PHONY: install-ingest-jira
install-ingest-jira:
python3 -m pip install -r requirements/ingest-jira.txt

.PHONY: install-unstructured-inference
install-unstructured-inference:
python3 -m pip install -r requirements/local-inference.txt
Expand Down
10 changes: 5 additions & 5 deletions docs/source/upstream_connectors/confluence.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Confluence
==========
Connect Confluence to your preprocessing pipeline, and batch process all your documents using ``unstructured-ingest`` to store structured outputs locally on your filesystem.
Connect Confluence to your preprocessing pipeline, and batch process all your documents using ``unstructured-ingest`` to store structured outputs locally on your filesystem.

First you'll need to install the Confluence dependencies as shown here.

Expand Down Expand Up @@ -34,7 +34,7 @@ Run Locally

command = [
"unstructured-ingest",
"confluence"
"confluence",
"--metadata-exclude", "filename,file_directory,metadata.data_source.date_processed",
"--url", "https://unstructured-ingest-test.atlassian.net",
"--user-email", "[email protected]",
Expand All @@ -58,7 +58,7 @@ Run Locally
Run via the API
---------------

You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.
You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.

.. tabs::

Expand All @@ -85,7 +85,7 @@ You can also use upstream connectors with the ``unstructured`` API. For this you

command = [
"unstructured-ingest",
"confluence"
"confluence",
"--metadata-exclude", "filename,file_directory,metadata.data_source.date_processed",
"--url", "https://unstructured-ingest-test.atlassian.net",
"--user-email", "[email protected]",
Expand All @@ -112,4 +112,4 @@ Additionaly, you will need to pass the ``--partition-endpoint`` if you're runnin

For a full list of the options the CLI accepts check ``unstructured-ingest confluence --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
12 changes: 6 additions & 6 deletions docs/source/upstream_connectors/discord.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Discord
==========
Connect Discord to your preprocessing pipeline, and batch process all your documents using ``unstructured-ingest`` to store structured outputs locally on your filesystem.
Connect Discord to your preprocessing pipeline, and batch process all your documents using ``unstructured-ingest`` to store structured outputs locally on your filesystem.

First you'll need to install the Discord dependencies as shown here.

Expand Down Expand Up @@ -34,7 +34,7 @@ Run Locally

command = [
"unstructured-ingest",
"discord"
"discord",
"--channels", "12345678",
"--token", "$DISCORD_TOKEN",
"--download-dir", "discord-ingest-download",
Expand All @@ -58,7 +58,7 @@ Run Locally
Run via the API
---------------

You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.
You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.

.. tabs::

Expand All @@ -73,7 +73,7 @@ You can also use upstream connectors with the ``unstructured`` API. For this you
--download-dir discord-ingest-download \
--structured-output-dir discord-example \
--preserve-downloads \
--verbose \
--verbose \
--partition-by-api \
--api-key "<UNSTRUCTURED-API-KEY>"

Expand All @@ -85,7 +85,7 @@ You can also use upstream connectors with the ``unstructured`` API. For this you

command = [
"unstructured-ingest",
"discord"
"discord",
"--channels", "12345678",
"--token", "$DISCORD_TOKEN",
"--download-dir", "discord-ingest-download",
Expand All @@ -112,4 +112,4 @@ Additionaly, you will need to pass the ``--partition-endpoint`` if you're runnin

For a full list of the options the CLI accepts check ``unstructured-ingest discord --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
115 changes: 115 additions & 0 deletions docs/source/upstream_connectors/jira.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
Jira
==========
Connect Jira to your preprocessing pipeline, and batch process all your documents using ``unstructured-ingest`` to store structured outputs locally on your filesystem.

First you'll need to install the Jira dependencies as shown here.

.. code:: shell

pip install "unstructured[jira]"

Run Locally
-----------

.. tabs::

.. tab:: Shell

.. code:: shell

unstructured-ingest \
jira \
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
--url https://unstructured-jira-connector-test.atlassian.net \
--user-email [email protected] \
--api-token ABCDE1234ABDE1234ABCDE1234 \
--structured-output-dir jira-ingest-output \
--num-processes 2

.. tab:: Python

.. code:: python

import subprocess

command = [
"unstructured-ingest",
"jira",
"--metadata-exclude", "filename,file_directory,metadata.data_source.date_processed",
"--url", "https://unstructured-jira-connector-test.atlassian.net",
"--user-email", "[email protected]",
"--api-token", "ABCDE1234ABDE1234ABCDE1234",
"--structured-output-dir", "jira-ingest-output",
"--num-processes", "2",
]

# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()

# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())

Run via the API
---------------

You can also use upstream connectors with the ``unstructured`` API. For this you'll need to use the ``--partition-by-api`` flag and pass in your API key with ``--api-key``.

.. tabs::

.. tab:: Shell

.. code:: shell

unstructured-ingest \
jira \
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
--url https://unstructured-jira-connector-test.atlassian.net \
--user-email [email protected] \
--api-token ABCDE1234ABDE1234ABCDE1234 \
--structured-output-dir jira-ingest-output \
--num-processes 2 \
--partition-by-api \
--api-key "<UNSTRUCTURED-API-KEY>"

.. tab:: Python

.. code:: python

import subprocess

command = [
"unstructured-ingest",
"jira",
"--metadata-exclude", "filename,file_directory,metadata.data_source.date_processed",
"--url", "https://unstructured-jira-connector-test.atlassian.net",
"--user-email", "[email protected]",
"--api-token", "ABCDE1234ABDE1234ABCDE1234",
"--structured-output-dir", "jira-ingest-output",
"--num-processes", "2",
"--partition-by-api",
"--api-key", "<UNSTRUCTURED-API-KEY>",
]

# Run the command
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output, error = process.communicate()

# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())

Additionaly, you will need to pass the ``--partition-endpoint`` if you're running the API locally. You can find more information about the ``unstructured`` API `here <https://github.com/Unstructured-IO/unstructured-api>`_.

For a full list of the options the CLI accepts check ``unstructured-ingest jira --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
41 changes: 41 additions & 0 deletions examples/ingest/jira/ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env bash

# Processes all the issues in all projects within a jira domain, using the `unstructured` library.

# Structured outputs are stored in jira-ingest-output
SCRIPT_DIR=$(dirname "$(realpath "$0")")
cd "$SCRIPT_DIR"/../../.. || exit 1

# Required arguments:
# --url
# --> Atlassian (Jira) domain URL
# --api-token
# --> Api token to authenticate into Atlassian (Jira).
# Check https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/ for more info.
# --user-email
# --> User email for the domain, such as [email protected]

# Optional arguments:
# --list-of-projects
# --> Space separated project ids or keys
# --list-of-boards
# --> Space separated board ids or keys
# --list-of-issues
# --> Space separated issue ids or keys

# Note: When any of the optional arguments are provided, connector will ingest only those components, and nothing else.
# When none of the optional arguments are provided, all issues in all projects will be ingested.


PYTHONPATH=. ./unstructured/ingest/main.py \
jira \
--metadata-exclude filename,file_directory,metadata.data_source.date_processed \
--url https://unstructured-jira-connector-test.atlassian.net \
--user-email "$JIRA_USER_EMAIL" \
--api-token "$JIRA_API_TOKEN" \
--structured-output-dir jira-ingest-output \
--num-processes 2 \
--reprocess
# --list-of-projects <your project keys/ids here (space separated)> \
# --list-of-boards <your board keys/ids here (space separated)> \
# --list-of-issues <your issue keys/ids here (space separated)> \
3 changes: 3 additions & 0 deletions requirements/ingest-jira.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-c constraints.in
-c base.txt
atlassian-python-api
43 changes: 43 additions & 0 deletions requirements/ingest-jira.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile requirements/ingest-jira.in
#
atlassian-python-api==3.41.1
# via -r requirements/ingest-jira.in
certifi==2023.7.22
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# requests
charset-normalizer==3.2.0
# via
# -c requirements/base.txt
# requests
deprecated==1.2.14
# via atlassian-python-api
idna==3.4
# via
# -c requirements/base.txt
# requests
oauthlib==3.2.2
# via
# atlassian-python-api
# requests-oauthlib
requests==2.31.0
# via
# -c requirements/base.txt
# atlassian-python-api
# requests-oauthlib
requests-oauthlib==1.3.1
# via atlassian-python-api
six==1.16.0
# via atlassian-python-api
urllib3==1.26.16
# via
# -c requirements/base.txt
# -c requirements/constraints.in
# requests
wrapt==1.15.0
# via deprecated
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ def load_requirements(file_list: Optional[Union[str, List[str]]] = None) -> List
"sharepoint": load_requirements("requirements/ingest-sharepoint.in"),
"delta-table": load_requirements("requirements/ingest-delta-table.in"),
"salesforce": load_requirements("requirements/ingest-salesforce.in"),
"jira": load_requirements("requirements/ingest-jira.in"),
# Legacy extra requirements
"huggingface": load_requirements("requirements/huggingface.in"),
"local-inference": all_doc_reqs,
Expand Down
Loading