Skip to content

Commit

Permalink
334 handle spark session creation when already exists (#335)
Browse files Browse the repository at this point in the history
* Enhance demo notebook and SparkModel.js for improved Spark session management

- Added a new code cell in the demo notebook to initialize a Spark session with detailed configuration settings, including application ID and Spark UI link.
- Updated SparkModel.js to check for existing Spark application IDs before storing new session information, improving error handling and preventing duplicate entries.
- Enhanced logging for better visibility into Spark session creation and management processes.

* Update Dockerfile to install npm dependencies with legacy peer dependencies flag

- Modified the npm install command to include the --legacy-peer-deps flag, ensuring compatibility with older peer dependencies during the build process of the React application.

* Update Dockerfile to use Node 18 and enhance build process

- Upgraded Node.js version from 14 to 18 for improved performance and compatibility.
- Cleared npm cache before installing dependencies to ensure a clean environment.
- Added installation of @jridgewell/gen-mapping to support additional functionality.
- Increased memory allocation for the build process by setting NODE_OPTIONS to 4096 MB.

* Update Dockerfile to use Node 14 and streamline build process

- Downgraded Node.js version from 18 to 14 for compatibility.
- Simplified npm installation by removing cache cleaning and legacy peer dependencies flag.
- Removed increased memory allocation for the build process, optimizing the Dockerfile for a more straightforward build.

* Update Dockerfile to use Node 18 and optimize build process

- Upgraded Node.js version from 14 to 18 for improved performance.
- Implemented a clean install of npm dependencies with legacy peer dependencies support.
- Added specific package installations for @jridgewell/gen-mapping and @babel/generator.
- Increased memory allocation for the build process by setting NODE_OPTIONS to 4096 MB.

* Refactor Dockerfile for improved npm dependency management and build process

- Updated npm installation commands to set legacy peer dependencies and install packages in a specific order.
- Cleaned npm cache and rebuilt before running the build command to ensure a fresh environment.
- Increased clarity and efficiency in the Dockerfile setup for the web application.

* Enhance Spark session management and update demo notebook

- Updated demo notebook to reflect successful Spark session execution, including updated execution metadata and application ID.
- Refactored Spark session creation in backend to streamline the process, removing unnecessary parameters and improving error handling.
- Modified SparkModel.js to ensure proper session initialization and validation of Spark application IDs.
- Improved logging for better visibility during Spark session creation and management processes.

* Refactor demo notebook and SparkModel.js for improved Spark session handling

- Removed outdated code cells from the demo notebook to enhance clarity and usability.
- Updated SparkModel.js to improve validation of Spark application IDs, ensuring they start with 'app-' and are correctly extracted from the HTML.
- Simplified the logic for storing Spark session information in the Notebook component, enhancing overall session management.

* Refactor Notebook.js and SparkModel.js for improved Spark app ID handling

- Updated Notebook.js to extract and store Spark app ID more efficiently, ensuring it is only stored if valid.
- Enhanced logging to provide clearer visibility of the extracted Spark app ID.
- Added a console log in SparkModel.js to confirm successful extraction of the Spark app ID, improving debugging capabilities.

* Refactor Notebook.js to streamline Spark app ID logging

- Removed redundant console log for Spark app ID and retained a single log statement for clarity.
- Enhanced error handling in the Notebook component to ensure better debugging during cell execution.

* Implement Spark app status endpoint and enhance logging in SparkModel.js

- Added a new endpoint to retrieve the status of a Spark application by its ID in spark_app.py, improving the API's functionality.
- Enhanced logging in SparkModel.js to provide better visibility during the storage process of Spark application information, including status checks and error handling.
- Improved validation for Spark application IDs to ensure only valid IDs are processed, contributing to more robust error management.

* Refactor Spark app status retrieval and enhance error handling

- Moved the Spark app status retrieval logic from the route handler in spark_app.py to a static method in SparkApp class for better separation of concerns.
- Improved error handling and logging in the new get_spark_app_status method, ensuring clearer responses for application not found and internal errors.
- Simplified the route handler to directly return the response from the SparkApp method, enhancing code readability and maintainability.

* Enhance notebook path handling in getSparkApps method

- Safely handle notebook paths by simplifying them when they match the pattern work/user@domain/notebook.ipynb.
- Improved clarity by logging the simplified notebook path for better debugging and visibility.

* Refactor Spark app route and simplify notebook path handling

- Removed unused JWT and user identification decorators from the Spark app route in spark_app.py for cleaner code.
- Simplified the notebook path handling in getSparkApps method of NotebookModel.js by removing unnecessary path simplification logic, allowing direct usage of the provided notebook path.
- Enhanced code readability and maintainability by streamlining the logic in both files.

* Add create_spark_app endpoint and enhance error handling in SparkApp service

* Enhance create_spark_app endpoint with user authentication and error handling

- Added JWT authentication and user identification decorators to the create_spark_app route in spark_app.py to ensure only authenticated users can create Spark applications.
- Implemented user context validation in the SparkApp service, returning a 401 response if the user is not found.
- Added a database rollback mechanism on error during Spark app creation to maintain data integrity.

* Enhance Spark session management and update demo notebook

- Updated demo notebook to include successful Spark session execution details, including execution metadata and application ID.
- Removed the create_spark_session endpoint from spark_app.py to streamline session management.
- Refactored SparkApp service by removing the create_spark_session method, as session creation is now handled directly in the notebook.
- Improved SparkModel.js to ensure proper validation and storage of Spark application IDs, including enhanced logging for better visibility during the process.
  • Loading branch information
xuwenyihust authored Dec 10, 2024
1 parent 02e3486 commit 3b2dca7
Show file tree
Hide file tree
Showing 6 changed files with 192 additions and 131 deletions.
33 changes: 33 additions & 0 deletions examples/[email protected]/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,39 @@
"- This is just a demo notebook\n",
"- For testing only"
]
},
{
"cell_type": "code",
"isExecuted": true,
"lastExecutionResult": "success",
"lastExecutionTime": "2024-12-10 10:26:03",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" <div style=\"border: 1px solid #e8e8e8; padding: 10px;\">\n",
" <h3>Spark Session Information</h3>\n",
" <p><strong>Config:</strong> {'spark.driver.memory': '1g', 'spark.driver.cores': 1, 'spark.executor.memory': '1g', 'spark.executor.cores': 1, 'spark.executor.instances': 1, 'spark.dynamicAllocation.enabled': False}</p>\n",
" <p><strong>Application ID:</strong> app-20241210080310-0003</p>\n",
" <p><strong>Spark UI:</strong> <a href=\"http://localhost:18080/history/app-20241210080310-0003\">http://localhost:18080/history/app-20241210080310-0003</a></p>\n",
" </div>\n",
" "
],
"text/plain": [
"Custom Spark Session (App ID: app-20241210080310-0003) - UI: http://0edb0a63b2fb:4040"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spark = create_spark(\"work/[email protected]/demo.ipynb\")\n",
"spark"
]
}
],
"metadata": {
Expand Down
34 changes: 11 additions & 23 deletions server/app/routes/spark_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,6 @@

logging.basicConfig(level=logging.INFO)

@spark_app_blueprint.route('/spark_app/<path:spark_app_id>', methods=['POST'])
def create_spark_app(spark_app_id):
data = request.get_json()
notebook_path = data.get('notebookPath', None)
return SparkApp.create_spark_app(spark_app_id=spark_app_id, notebook_path=notebook_path)

# @jwt_required()
# @identify_user
@spark_app_blueprint.route('/spark_app/<path:notbook_path>/config', methods=['GET'])
def get_spark_app_config(notbook_path):
logging.info(f"Getting spark app config for notebook path: {notbook_path}")
Expand All @@ -27,20 +19,16 @@ def update_spark_app_config(notbook_path):
data = request.get_json()
return SparkApp.update_spark_app_config_by_notebook_path(notbook_path, data)

@spark_app_blueprint.route('/spark_app/session', methods=['POST'])
def create_spark_session():
@spark_app_blueprint.route('/spark_app/<spark_app_id>/status', methods=['GET'])
def get_spark_app_status(spark_app_id):
logging.info(f"Getting spark app status for app id: {spark_app_id}")
return SparkApp.get_spark_app_status(spark_app_id)

@spark_app_blueprint.route('/spark_app/<spark_app_id>', methods=['POST'])
@jwt_required()
@identify_user
def create_spark_app(spark_app_id):
logging.info(f"Creating spark app with id: {spark_app_id}")
data = request.get_json()
notebook_path = data.get('notebookPath')
spark_config = data.get('config')

try:
spark_app_id = SparkApp.create_spark_session(notebook_path, spark_config)
return jsonify({
'status': 'success',
'sparkAppId': spark_app_id
})
except Exception as e:
return jsonify({
'status': 'error',
'message': str(e)
}), 500
return SparkApp.create_spark_app(spark_app_id=spark_app_id, notebook_path=notebook_path)
103 changes: 62 additions & 41 deletions server/app/services/spark_app.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from app.models.spark_app import SparkAppModel
from app.models.notebook import NotebookModel
from app.models.spark_app_config import SparkAppConfigModel
from flask import Response
from flask import g, Response
from datetime import datetime
import json
from database import db
Expand Down Expand Up @@ -143,45 +143,66 @@ def update_spark_app_config_by_notebook_path(notebook_path: str = None, data: di
status=200)

@staticmethod
def create_spark_app(spark_app_id: str = None, notebook_path: str = None):
logger.info(f"Creating spark app with id: {spark_app_id} for notebook path: {notebook_path}")

if spark_app_id is None:
logger.error("Spark app id is None")
return Response(
response=json.dumps({'message': 'Spark app id is None'}),
status=404)

if notebook_path is None:
logger.error("Notebook path is None")
return Response(
response=json.dumps({'message': 'Notebook path is None'}),
status=404)

def get_spark_app_status(spark_app_id: str):
logger.info(f"Getting spark app status for app id: {spark_app_id}")
try:
# Get the notebook id
notebook = NotebookModel.query.filter_by(path=notebook_path).first()
notebook_id = notebook.id

# Create the spark app
spark_app = SparkAppModel(
spark_app_id=spark_app_id,
notebook_id=notebook_id,
user_id=notebook.user_id,
created_at=datetime.now().strftime("%Y-%m-%d %H:%M:%S")
)

db.session.add(spark_app)
db.session.commit()

logger.info(f"Spark app created: {spark_app}")
spark_app = SparkAppModel.query.filter_by(spark_app_id=spark_app_id).first()
if spark_app is None:
logger.error("Spark application not found")
return Response(
response=json.dumps({'message': 'Spark application not found'}),
status=404
)
return Response(
response=json.dumps({'status': spark_app.status}),
status=200
)
except Exception as e:
logger.error(f"Error creating spark app: {e}")
return Response(
response=json.dumps({'message': 'Error creating spark app: ' + str(e)}),
status=404)

return Response(
response=json.dumps(spark_app.to_dict()),
status=200
)
logger.error(f"Error getting spark app status: {e}")
return Response(
response=json.dumps({'message': str(e)}),
status=500
)

@staticmethod
def create_spark_app(spark_app_id: str, notebook_path: str):
logger.info(f"Creating spark app with id: {spark_app_id} for notebook: {notebook_path}")
try:
if not g.user:
logger.error("User not found in context")
return Response(
response=json.dumps({'message': 'User not authenticated'}),
status=401
)

# Get the notebook
notebook = NotebookModel.query.filter_by(path=notebook_path).first()
if notebook is None:
logger.error("Notebook not found")
return Response(
response=json.dumps({'message': 'Notebook not found'}),
status=404
)

# Create new spark app
spark_app = SparkAppModel(
spark_app_id=spark_app_id,
notebook_id=notebook.id,
user_id=g.user.id,
created_at=datetime.now()
)

db.session.add(spark_app)
db.session.commit()

return Response(
response=json.dumps(spark_app.to_dict()),
status=200
)
except Exception as e:
logger.error(f"Error creating spark app: {e}")
db.session.rollback() # Add rollback on error
return Response(
response=json.dumps({'message': str(e)}),
status=500
)
22 changes: 19 additions & 3 deletions webapp/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
# Stage 1: Build the React application
FROM node:14 as build
FROM node:18 as build
WORKDIR /app

# Copy package files
COPY package*.json ./
RUN npm install

# Clean and setup npm
RUN npm cache clean --force && \
npm set legacy-peer-deps=true

# Install dependencies in a specific order
RUN npm install && \
npm install @jridgewell/[email protected] && \
npm install @babel/[email protected] && \
npm install @babel/[email protected]

# Copy the rest of the application
COPY . .
RUN npm run build

# Build with increased memory limit
ENV NODE_OPTIONS="--max-old-space-size=4096"
RUN npm rebuild && npm run build

# Stage 2: Serve the app with nginx
FROM nginx:stable-alpine
Expand Down
16 changes: 8 additions & 8 deletions webapp/src/components/notebook/Notebook.js
Original file line number Diff line number Diff line change
Expand Up @@ -257,11 +257,13 @@ function Notebook({

// Check if contains a spark app id
if (result[0] && result[0].data && result[0].data['text/html'] && SparkModel.isSparkInfo(result[0].data['text/html'])) {
setSparkAppId(SparkModel.extractSparkAppId(result[0].data['text/html']));
SparkModel.storeSparkInfo(SparkModel.extractSparkAppId(result[0].data['text/html']), notebook.path)
const appId = SparkModel.extractSparkAppId(result[0].data['text/html']);
setSparkAppId(appId);
if (appId) {
SparkModel.storeSparkInfo(appId, notebook.path);
}
console.log('Spark app id:', appId);
}
console.log('Spark app id:', sparkAppId);

} catch (error) {
console.error('Failed to execute cell:', error);
}
Expand All @@ -288,7 +290,7 @@ function Notebook({
const handleCreateSparkSession = async () => {
console.log('Create Spark session clicked');
try {
const { sparkAppId, initializationCode } = await SparkModel.createSparkSession(notebookState.path);
const { initializationCode } = await SparkModel.createSparkSession(notebookState.path);

// Create a new cell with the initialization code
const newCell = {
Expand All @@ -306,12 +308,10 @@ function Notebook({
content: { ...notebookState.content, cells }
});

// Execute the cell (now need to use the last index)
// Execute the cell
const newCellIndex = cells.length - 1;
await handleRunCodeCell(newCell, CellStatus.IDLE, (status) => setCellStatus(newCellIndex, status));

console.log('Spark session created with ID:', sparkAppId);
setSparkAppId(sparkAppId);
} catch (error) {
console.error('Failed to create Spark session:', error);
alert('Failed to create Spark session. Please check the configuration.');
Expand Down
Loading

0 comments on commit 3b2dca7

Please sign in to comment.