You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
java.io.IOException: alluxio.exception.DirectoryNotEmptyException: Failed to delete 3 paths from the under file system:
/kdl/pre-kde/C4CA4238A0B923820DCC509A6F75849B/hive/idcp_prod/driving_score_rcd_datatx_merge/_temporary/0/_temporary (UFS dir not in sync. Sync UFS, or delete with unchecked flag.),
/kdl/pre-kde/C4CA4238A0B923820DCC509A6F75849B/hive/idcp_prod/driving_score_rcd_datatx_merge/_temporary/0 (Directory not empty),
/kdl/pre-kde/C4CA4238A0B923820DCC509A6F75849B/hive/idcp_prod/driving_score_rcd_datatx_merge/_temporary (Directory not empty)
`
spark.sql("""
CREATE TABLE IF NOT EXISTS tmx_hive1.day_table_orc12(
id INT,
content STRING
)
PARTITIONED BY (dt STRING)
STORED AS ORC
""")
rows_per_file = 2
total_files = 6
total_rows = rows_per_file * total_files
data = [(i,generate_fixed_length_string(100), "20241213") for i in range(total_rows)]
columns = ["id", "content", "dt"]
df = spark.createDataFrame(data, columns)
#print(f"Original number of partitions: {df.rdd.getNumPartitions()}")
df_repartitioned = df.repartition(total_files, "id")
df_repartitioned.write \
.mode("append") \
.format("hive") \
.partitionBy("dt") \
.saveAsTable("tmx_hive1.day_table_orc12")
spark.stop()`
运行pyspark任务时alluxio日志
master-rpc-executor-TPE-thread-159
Failed to sync metadata on root path InodeSyncStream{rootPath=LockingScheme{path=/kcde/hdfs/tmx/C20AD4D76FE97759AA27A0C99BFF6710/hive/tmx_hive1/day_table_orc12/dt=20241213, desiredLockPattern=READ, shouldSync={Should sync: false, Last sync time: 1734508706428}}, descendantType=NONE, commonOptions=syncIntervalMs: -1
ttl: -1
ttlAction: DELETE
, forceSync=true} because it does not exist on the UFS or in Alluxio
2024-12-18 16:32:02,973 INFO
master-rpc-executor-TPE-thread-169
Updating inode 'part-00001-e18c745a-1855-4efb-832c-16ec0f0985e9.c000' mode bits from rw-r--r-- to rwxrwxrwx
2024-12-18 16:32:02,980 INFO
master-rpc-executor-TPE-thread-85
Updating inode 'part-00002-e18c745a-1855-4efb-832c-16ec0f0985e9.c000' mode bits from rw-r--r-- to rwxrwxrwx
2024-12-18 16:32:02,985 INFO
master-rpc-executor-TPE-thread-187
Updating inode 'part-00003-e18c745a-1855-4efb-832c-16ec0f0985e9.c000' mode bits from rw-r--r-- to rwxrwxrwx
2024-12-18 16:32:02,990 INFO
master-rpc-executor-TPE-thread-183
Updating inode 'part-00005-e18c745a-1855-4efb-832c-16ec0f0985e9.c000' mode bits from rw-r--r-- to rwxrwxrwx
2024-12-18 16:32:04,670 WARN
master-rpc-executor-TPE-thread-92
Failed to sync metadata on root path InodeSyncStream{rootPath=LockingScheme{path=/kdl/kcde/spark/spark3-history/0/application_1731383355780_0129_1, desiredLockPattern=READ, shouldSync={Should sync: false, Last sync time: 0}}, descendantType=NONE, commonOptions=syncIntervalMs: -1
ttl: -1
ttlAction: DELETE
, forceSync=true} because it does not exist on the UFS or in Alluxio
The text was updated successfully, but these errors were encountered:
问题场景-元数据不一致
pyspark任务
`
spark.sql("""
CREATE TABLE IF NOT EXISTS tmx_hive1.day_table_orc12(
id INT,
content STRING
)
PARTITIONED BY (dt STRING)
STORED AS ORC
""")
运行pyspark任务时alluxio日志
master-rpc-executor-TPE-thread-159
ttl: -1
ttlAction: DELETE
, forceSync=true} because it does not exist on the UFS or in Alluxio
2024-12-18 16:32:02,973 INFO
master-rpc-executor-TPE-thread-169
2024-12-18 16:32:02,980 INFO
master-rpc-executor-TPE-thread-85
2024-12-18 16:32:02,985 INFO
master-rpc-executor-TPE-thread-187
2024-12-18 16:32:02,990 INFO
master-rpc-executor-TPE-thread-183
2024-12-18 16:32:04,670 WARN
master-rpc-executor-TPE-thread-92
ttl: -1
ttlAction: DELETE
, forceSync=true} because it does not exist on the UFS or in Alluxio
The text was updated successfully, but these errors were encountered: