Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe-ify the output of Anserini-Spark #13

Open
wants to merge 41 commits into
base: master
Choose a base branch
from

Conversation

mayankanand007
Copy link
Contributor

  1. PySpark: The conversion from PySpark RDD to DataFrame was successful and confirmed that all DF operations are working successfully.
  2. Scala Spark: There are some issues converting docs2 which is of the type org.apache.spark.api.java.JavaRDD[java.util.HashMap[String,String]] to a DataFrame in Scala as conventional options such as toDF() or spark.createDataFrame do not support arguments of this particular data type. So still need to figure out how to do the conversion if we want to read the entire document as a HashMap. There is also no straight method that works with a Scala RDD.

@lintool do let me know if this is what you were expecting, I can make changes accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant