Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen_test_train_data/gen_all.lua (Step. 10) doesn't work #21

Closed
titsuki opened this issue Feb 17, 2019 · 6 comments
Closed

gen_test_train_data/gen_all.lua (Step. 10) doesn't work #21

titsuki opened this issue Feb 17, 2019 · 6 comments

Comments

@titsuki
Copy link

titsuki commented Feb 17, 2019

Hi,
Step 10 doesn't work on my environment.
I think there are incorrect settings on the environment.
Could you tell me the Lua or Linux versions on which the paper experiment was conducted?

Reproducible procedure:

# Status: Step1. ~ Step.9 are done
root@96a4f2e0d6cd:~/deep-ed# th data_gen/gen_test_train_data/gen_all.lua -root_data_dir /root/
==> Loading redirects index	
    Done loading redirects index	
==> Loading entity wikiid - name map	
  ---> from t7 file: /root/generated/ent_name_id_map.t7	
    Done loading entity name - wikiid. Size thid index = 4306070	
==> Loading crosswikis_wikipedia from file /root/generated/crosswikis_wikipedia_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
==> Loading yago index from file /root/generated/yago_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
    Done loading index	

Generating test data from AIDA set 	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Iván García not found. Redirects file needs to be loaded for better performance.	
Entity Akhbar not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Oksana Grishina not found. Redirects file needs to be loaded for better performance.	
Entity Craig Brown not found. Redirects file needs to be loaded for better performance.	
Entity John Collins not found. Redirects file needs to be loaded for better performance.	
Entity International Boxing Association not found. Redirects file needs to be loaded for better performance.	
Entity Ramón Ramírez not found. Redirects file needs to be loaded for better performance.	
Done validation testA : 	
num_nme = 1126; num_nonexistent_ent_title = 3189	
num_nonexistent_ent_id = 0; num_nonexistent_both = 35	
num_correct_ents = 1567; num_total_ents = 4791	
Entity World Open not found. Redirects file needs to be loaded for better performance.	
Entity Douglas Young not found. Redirects file needs to be loaded for better performance.	
Entity Douglas Young not found. Redirects file needs to be loaded for better performance.	
Entity James Love not found. Redirects file needs to be loaded for better performance.	
Entity Noel Whelan not found. Redirects file needs to be loaded for better performance.	
    Done AIDA.	
num_nme = 2257; num_nonexistent_ent_title = 6255	
num_nonexistent_ent_id = 0; num_nonexistent_both = 72	
num_correct_ents = 2949; num_total_ents = 9276	

Generating train data from AIDA set 	
Entity Craig Brown not found. Redirects file needs to be loaded for better performance.	
Entity International cricketers of South African origin not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Carlos Costa not found. Redirects file needs to be loaded for better performance.	
Entity Antonio Esposito not found. Redirects file needs to be loaded for better performance.	
Entity Independence Day (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Independence Day (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Erik Hanson not found. Redirects file needs to be loaded for better performance.	
Entity Erik Hanson not found. Redirects file needs to be loaded for better performance.	
Entity Iván García not found. Redirects file needs to be loaded for better performance.	
Entity Camelot, Chesapeake, Virginia not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Gordon Parsons not found. Redirects file needs to be loaded for better performance.	
Entity Xhosa not found. Redirects file needs to be loaded for better performance.	
Entity Xhosa not found. Redirects file needs to be loaded for better performance.	
Entity Jamaat-e-Islami not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Franz Konrad not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Carlos Costa not found. Redirects file needs to be loaded for better performance.	
Entity Craig Evans not found. Redirects file needs to be loaded for better performance.	
Entity Preston not found. Redirects file needs to be loaded for better performance.	
Entity Superman (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Superman (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Ashta not found. Redirects file needs to be loaded for better performance.	
Entity John Smiley not found. Redirects file needs to be loaded for better performance.	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Oksana Grishina not found. Redirects file needs to be loaded for better performance.	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Bandundu not found. Redirects file needs to be loaded for better performance.	
Entity Čelopek not found. Redirects file needs to be loaded for better performance.	
    Done AIDA.	
num_nme = 4855; num_nonexistent_ent_title = 12103	
num_nonexistent_ent_id = 0; num_nonexistent_both = 236	
num_correct_ents = 6202; num_total_ents = 18541	
==> Loading redirects index	
    Done loading redirects index	
==> Loading entity wikiid - name map	
  ---> from t7 file: /root/generated/ent_name_id_map.t7	
    Done loading entity name - wikiid. Size thid index = 4306070	
==> Loading crosswikis_wikipedia from file /root/generated/crosswikis_wikipedia_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
==> Loading yago index from file /root/generated/yago_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
    Done loading index	

Generating test data from wikipedia set 	
Entity Christina (given name) not found. Redirects file needs to be loaded for better performance.	
Christina (given name)	
Entity Christina (given name) not found. Redirects file needs to be loaded for better performance.	
Christina (given name)	
Entity Kirsten not found. Redirects file needs to be loaded for better performance.	
Kirsten	
/root/torch/install/bin/luajit: data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua:184: attempt to index local 'it' (a nil value)
stack traceback:
	data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua:184: in function 'gen_test_ace'
	data_gen/gen_test_train_data/gen_ace_msnbc_aquaint_csv.lua:202: in main chunk
	[C]: in function 'dofile'
	data_gen/gen_test_train_data/gen_all.lua:13: in main chunk
	[C]: in function 'dofile'
	/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

Environment:

Dockerfile:

FROM ubuntu:16.04

RUN apt-get update
RUN apt-get install -y git
RUN git clone https://github.com/torch/distro.git /root/torch --recursive
RUN apt-get install -y sudo python-software-properties unzip wget
RUN apt-get install -y lua5.1:amd64 lua5.1-dev:amd64
RUN wget https://luarocks.org/releases/luarocks-3.0.4.tar.gz \
        && tar zxpf luarocks-3.0.4.tar.gz \
        && cd luarocks-3.0.4 \
        && ./configure \
        && sudo bash -c "make bootstrap" \
        && sudo bash -c "luarocks install luasocket"
RUN cd /root/torch && sudo bash -c "./install-deps" && ./install.sh

RUN mkdir /root/generated
RUN git clone https://github.com/dalab/deep-ed /root/deep-ed

setup.sh (A setup script. I ran it it after building the docker container.)

curl -c /tmp/cookies "https://drive.google.com/uc?export=download&id=0Bx8d3azIm_ZcbHMtVmRVc1o5TWM" > /tmp/intermed-basic-data.html
curl -L -b /tmp/cookies "https://drive.google.com$(cat /tmp/intermed-basic-data.html | grep -Po 'uc-download-link" [^>]* href="\K[^"]*' | sed 's/\&/\&/g')" > /root/basic_data.zip
cd /root && unzip basic_data.zip

curl -c /tmp/cookies "https://drive.google.com/uc?export=download&id=0B7XkCwpI5KDYNlNUTTlSS21pQmM" > /tmp/intermed-w2v.html
curl -L -b /tmp/cookies "https://drive.google.com$(cat /tmp/intermed-w2v.html | grep -Po 'uc-download-link" [^>]* href="\K[^"]*' | sed 's/\&/\&/g')" > /root/GoogleNews-vectors-negative300.bin.gz
cd /root && gunzip GoogleNews-vectors-negative300.bin.gz
mv /root/GoogleNews-vectors-negative300.bin /root/basic_data/wordEmbeddings/Word2Vec

luarocks install tds

docker version (host computer's one):

$ docker --version
Docker version 18.06.1-ce, build e68fc7a

Cheers,

@octavian-ganea
Copy link
Contributor

It seems to me that the file opened at gen_ace_msnbc_aquaint_csv.lua:182 does not exist. Can you check that the path defined in line gen_ace_msnbc_aquaint_csv.lua:42 contains the test datasets ? This should be true if step 4 was done properly.

@titsuki
Copy link
Author

titsuki commented Feb 17, 2019

@octavian-ganea Thanks for your response!

Can you check that the path defined in line gen_ace_msnbc_aquaint_csv.lua:42 contains the test datasets ?

Here is the result tree returns:

# tree /root/basic_data/test_datasets/wned-datasets/
/root/basic_data/test_datasets/wned-datasets/
|-- README
|-- WHERE_TO_GET_THIS_DATA~
|-- ace2004
|   |-- RawText
|   |   |-- 20000715_AFP_ARB.0072.eng
|   |   |-- 20000815_AFP_ARB.0071.eng
|   |   |-- 20001015_AFP_ARB.0053.eng
|   |   |-- 20001015_AFP_ARB.0229.eng
|   |   |-- 20001115_AFP_ARB.0013.eng
|   |   |-- 20001115_AFP_ARB.0030.eng
|   |   |-- 20001115_AFP_ARB.0060.eng
|   |   |-- 20001115_AFP_ARB.0061.eng
|   |   |-- 20001115_AFP_ARB.0065.eng
|   |   |-- 20001115_AFP_ARB.0072.eng
|   |   |-- 20001115_AFP_ARB.0089.eng
|   |   |-- 20001115_AFP_ARB.0093.eng
|   |   |-- 20001115_AFP_ARB.0184.eng
|   |   |-- 20001115_AFP_ARB.0210.eng
|   |   |-- 20001115_AFP_ARB.0212.eng
|   |   |-- 20001115_AFP_ARB.0217.eng
|   |   |-- APW20001001.2021.0521
|   |   |-- APW20001002.0615.0146
|   |   |-- APW20001016.1325.0321
|   |   |-- APW20001017.1313.0396
|   |   |-- APW20001022.1735.0376
|   |   |-- APW20001023.2100.0686
|   |   |-- APW20001102.1223.0376
|   |   |-- APW20001120.1450.0376
|   |   |-- APW20001127.1346.0419
|   |   |-- APW20001130.2108.0849
|   |   |-- APW20001202.0257.0120
|   |   |-- APW20001203.1456.0329
|   |   |-- APW20001207.2118.0838
|   |   |-- APW20001208.1126.0362
|   |   |-- APW20001211.0507.0196
|   |   |-- APW20001216.2012.0590
|   |   |-- APW20001218.2221.0727
|   |   |-- APW20001219.1316.0416
|   |   |-- APW20001225.2035.0477
|   |   |-- NYT20001002.1754.0290
|   |   |-- NYT20001101.2212.0429
|   |   |-- NYT20001106.1705.0187
|   |   |-- NYT20001109.1946.0315
|   |   |-- NYT20001123.1511.0062
|   |   |-- NYT20001124.2050.0257
|   |   |-- NYT20001125.1558.0117
|   |   |-- NYT20001129.2040.0383
|   |   |-- NYT20001217.2241.0165
|   |   |-- PRI20001031.2000.1824
|   |   |-- PRI20001122.2000.0320
|   |   |-- PRI20001128.2000.0055
|   |   |-- PRI20001201.2000.1828
|   |   |-- VOA20001020.2100.1853
|   |   |-- VOA20001129.2000.0364
|   |   |-- VOA20001208.2000.1275
|   |   |-- VOA20001220.2000.0060
|   |   |-- VOA20001223.2000.0139
|   |   |-- chtb_165.eng
|   |   |-- chtb_171.eng
|   |   |-- chtb_227.eng
|   |   `-- chtb_267.eng
|   `-- ace2004.xml
|-- aquaint
|   |-- RawText
|   |   |-- APW19980603_0791.htm
|   |   |-- APW19980603_1617.htm
|   |   |-- APW19980604_0787.htm
|   |   |-- APW19980610_0111.htm
|   |   |-- APW19980611_0774.htm
|   |   |-- APW19980614_0031.htm
|   |   |-- APW19980615_0417.htm
|   |   |-- APW19980620_0458.htm
|   |   |-- APW19980624_0436.htm
|   |   |-- APW19980624_0607.htm
|   |   |-- APW19980625_1136.htm
|   |   |-- APW19980627_0596.htm
|   |   |-- APW19980709_0263.htm
|   |   |-- APW19980713_0449.htm
|   |   |-- APW19980808_0196.htm
|   |   |-- APW19980811_0512.htm
|   |   |-- APW19980816_0994.htm
|   |   |-- APW19980824_0827.htm
|   |   |-- APW19980903_1073.htm
|   |   |-- APW19980917_0818.htm
|   |   |-- APW19980930_0284.htm
|   |   |-- APW19980930_0522.htm
|   |   |-- APW19981001_0866.htm
|   |   |-- APW19981010_0354.htm
|   |   |-- APW19981020_1367.htm
|   |   |-- APW19981022_0630.htm
|   |   |-- APW19981022_0710.htm
|   |   |-- APW19981026_0096.htm
|   |   |-- APW19981106_0920.htm
|   |   |-- APW19981109_0140.htm
|   |   |-- APW19981109_0152.htm
|   |   |-- APW19981109_0440.htm
|   |   |-- APW19981109_0464.htm
|   |   |-- APW19981109_1089.htm
|   |   |-- APW19981109_1172.htm
|   |   |-- APW19981113_0500.htm
|   |   |-- APW19981113_0729.htm
|   |   |-- APW19981119_0585.htm
|   |   |-- APW19981120_1056.htm
|   |   |-- APW19981130_0743.htm
|   |   |-- APW19981210_0433.htm
|   |   |-- APW19981215_1083.htm
|   |   |-- APW19990120_0179.htm
|   |   |-- APW19990203_0315.htm
|   |   |-- APW19990519_0141.htm
|   |   |-- APW19990526_0131.htm
|   |   |-- APW19990827_0137.htm
|   |   |-- APW19990827_0184.htm
|   |   |-- APW20000303_0067.htm
|   |   `-- APW20000312_0050.htm
|   `-- aquaint.xml
|-- clueweb
|   |-- RawText
|   |   |-- clueweb12-0500wb-04-24340
|   |   |-- clueweb12-0500wb-07-07346
|   |   |-- clueweb12-0500wb-07-08673
|   |   |-- clueweb12-0500wb-07-08677
|   |   |-- clueweb12-0500wb-07-11872
|   |   |-- clueweb12-0500wb-07-28725
|   |   |-- clueweb12-0500wb-10-30439
|   |   |-- clueweb12-0500wb-16-19356
|   |   |-- clueweb12-0500wb-16-24407
|   |   |-- clueweb12-0500wb-17-08784
|   |   |-- clueweb12-0500wb-17-27238
|   |   |-- clueweb12-0500wb-19-13842
|   |   |-- clueweb12-0500wb-30-10322
|   |   |-- clueweb12-0500wb-30-19142
|   |   |-- clueweb12-0500wb-31-00905
|   |   |-- clueweb12-0500wb-31-03359
|   |   |-- clueweb12-0500wb-31-14123
|   |   |-- clueweb12-0500wb-33-03178
|   |   |-- clueweb12-0500wb-35-02519
|   |   |-- clueweb12-0500wb-35-21293
|   |   |-- clueweb12-0500wb-35-24654
|   |   |-- clueweb12-0500wb-35-33000
|   |   |-- clueweb12-0500wb-35-33005
|   |   |-- clueweb12-0500wb-37-00364
|   |   |-- clueweb12-0500wb-37-04409
|   |   |-- clueweb12-0500wb-38-02702
|   |   |-- clueweb12-0500wb-39-21685
|   |   |-- clueweb12-0500wb-54-21485
|   |   |-- clueweb12-0500wb-61-15900
|   |   |-- clueweb12-0500wb-62-09545
|   |   |-- clueweb12-0500wb-62-19286
|   |   |-- clueweb12-0500wb-68-02401
|   |   |-- clueweb12-0500wb-68-22591
|   |   |-- clueweb12-0500wb-68-30102
|   |   |-- clueweb12-0500wb-70-20598
|   |   |-- clueweb12-0500wb-72-07757
|   |   |-- clueweb12-0500wb-74-13854
|   |   |-- clueweb12-0500wb-74-14380
|   |   |-- clueweb12-0500wb-74-15300
|   |   |-- clueweb12-0500wb-77-31039
|   |   |-- clueweb12-0500wb-84-00046
|   |   |-- clueweb12-0500wb-96-03580
|   |   |-- clueweb12-0500wb-96-11959
|   |   |-- clueweb12-0500wb-96-14121
|   |   |-- clueweb12-0501wb-00-09041
|   |   |-- clueweb12-0501wb-00-36510
|   |   |-- clueweb12-0501wb-01-15462
|   |   |-- clueweb12-0501wb-03-25958
|   |   |-- clueweb12-0501wb-04-01041
|   |   |-- clueweb12-0501wb-04-06229
|   |   |-- clueweb12-0501wb-04-09196
|   |   |-- clueweb12-0501wb-04-30288
|   |   |-- clueweb12-0501wb-05-05698
|   |   |-- clueweb12-0501wb-05-16863
|   |   |-- clueweb12-0501wb-05-17155
|   |   |-- clueweb12-0501wb-05-31052
|   |   |-- clueweb12-0501wb-06-28661
|   |   |-- clueweb12-0501wb-09-02087
|   |   |-- clueweb12-0501wb-09-25340
|   |   |-- clueweb12-0501wb-12-23529
|   |   |-- clueweb12-0501wb-16-00982
|   |   |-- clueweb12-0501wb-16-20050
|   |   |-- clueweb12-0501wb-17-15597
|   |   |-- clueweb12-0501wb-18-10891
|   |   |-- clueweb12-0501wb-18-11681
|   |   |-- clueweb12-0501wb-21-13304
|   |   |-- clueweb12-0501wb-21-25436
|   |   |-- clueweb12-0501wb-23-07710
|   |   |-- clueweb12-0501wb-25-01939
|   |   |-- clueweb12-0501wb-25-03448
|   |   |-- clueweb12-0501wb-25-11404
|   |   |-- clueweb12-0501wb-27-26993
|   |   |-- clueweb12-0501wb-27-33204
|   |   |-- clueweb12-0501wb-29-03710
|   |   |-- clueweb12-0501wb-29-11725
|   |   |-- clueweb12-0501wb-29-32464
|   |   |-- clueweb12-0501wb-30-29261
|   |   |-- clueweb12-0501wb-31-16772
|   |   |-- clueweb12-0501wb-31-17194
|   |   |-- clueweb12-0501wb-31-22539
|   |   |-- clueweb12-0501wb-31-27451
|   |   |-- clueweb12-0501wb-33-16523
|   |   |-- clueweb12-0501wb-34-00945
|   |   |-- clueweb12-0501wb-34-34376
|   |   |-- clueweb12-0501wb-35-23286
|   |   |-- clueweb12-0501wb-37-01789
|   |   |-- clueweb12-0501wb-38-01436
|   |   |-- clueweb12-0501wb-39-18485
|   |   |-- clueweb12-0501wb-39-26619
|   |   |-- clueweb12-0501wb-40-14762
|   |   |-- clueweb12-0501wb-40-22301
|   |   |-- clueweb12-0501wb-41-31320
|   |   |-- clueweb12-0501wb-42-15997
|   |   |-- clueweb12-0501wb-42-25891
|   |   |-- clueweb12-0501wb-43-13087
|   |   |-- clueweb12-0501wb-44-00170
|   |   |-- clueweb12-0501wb-45-19647
|   |   |-- clueweb12-0501wb-46-21729
|   |   |-- clueweb12-0501wb-48-00449
|   |   |-- clueweb12-0501wb-48-19861
|   |   |-- clueweb12-0501wb-49-25960
|   |   |-- clueweb12-0501wb-52-13555
|   |   |-- clueweb12-0501wb-52-32516
|   |   |-- clueweb12-0501wb-54-22871
|   |   |-- clueweb12-0501wb-55-00003
|   |   |-- clueweb12-0501wb-55-06011
|   |   |-- clueweb12-0501wb-56-10514
|   |   |-- clueweb12-0501wb-56-16789
|   |   |-- clueweb12-0501wb-58-21067
|   |   |-- clueweb12-0501wb-59-11649
|   |   |-- clueweb12-0501wb-61-27562
|   |   |-- clueweb12-0501wb-62-15643
|   |   |-- clueweb12-0501wb-64-14081
|   |   |-- clueweb12-0501wb-65-12193
|   |   |-- clueweb12-0501wb-66-24342
|   |   |-- clueweb12-0501wb-67-13782
|   |   |-- clueweb12-0501wb-68-12348
|   |   |-- clueweb12-0501wb-69-31274
|   |   |-- clueweb12-0501wb-70-05124
|   |   |-- clueweb12-0501wb-70-14463
|   |   |-- clueweb12-0501wb-71-01322
|   |   |-- clueweb12-0501wb-71-06735
|   |   |-- clueweb12-0501wb-72-08395
|   |   |-- clueweb12-0501wb-73-35834
|   |   |-- clueweb12-0501wb-75-15053
|   |   |-- clueweb12-0501wb-76-00560
|   |   |-- clueweb12-0501wb-76-11292
|   |   |-- clueweb12-0501wb-76-21404
|   |   |-- clueweb12-0501wb-76-28210
|   |   |-- clueweb12-0501wb-77-15901
|   |   |-- clueweb12-0501wb-78-14351
|   |   |-- clueweb12-0501wb-78-19265
|   |   |-- clueweb12-0501wb-79-00401
|   |   |-- clueweb12-0501wb-79-05134
|   |   |-- clueweb12-0501wb-80-04105
|   |   |-- clueweb12-0501wb-80-17824
|   |   |-- clueweb12-0501wb-80-27834
|   |   |-- clueweb12-0501wb-81-15537
|   |   |-- clueweb12-0501wb-83-07261
|   |   |-- clueweb12-0501wb-83-18207
|   |   |-- clueweb12-0501wb-85-11282
|   |   |-- clueweb12-0501wb-86-16603
|   |   |-- clueweb12-0501wb-86-19365
|   |   |-- clueweb12-0501wb-86-24847
|   |   |-- clueweb12-0501wb-86-25362
|   |   |-- clueweb12-0501wb-87-14877
|   |   |-- clueweb12-0501wb-88-01729
|   |   |-- clueweb12-0501wb-88-22871
|   |   |-- clueweb12-0501wb-91-13151
|   |   |-- clueweb12-0501wb-91-19043
|   |   |-- clueweb12-0501wb-91-19798
|   |   |-- clueweb12-0501wb-92-16378
|   |   |-- clueweb12-0501wb-92-23889
|   |   |-- clueweb12-0501wb-93-01863
|   |   |-- clueweb12-0501wb-93-23297
|   |   |-- clueweb12-0501wb-97-28866
|   |   |-- clueweb12-0501wb-98-00398
|   |   |-- clueweb12-0501wb-98-01695
|   |   |-- clueweb12-0501wb-98-27096
|   |   |-- clueweb12-0502wb-01-07830
|   |   |-- clueweb12-0502wb-01-07839
|   |   |-- clueweb12-0502wb-01-13335
|   |   |-- clueweb12-0502wb-05-04897
|   |   |-- clueweb12-0502wb-06-07392
|   |   |-- clueweb12-0502wb-07-00626
|   |   |-- clueweb12-0502wb-07-25981
|   |   |-- clueweb12-0502wb-08-17623
|   |   |-- clueweb12-0502wb-09-02801
|   |   |-- clueweb12-0502wb-10-02722
|   |   |-- clueweb12-0502wb-11-01409
|   |   |-- clueweb12-0502wb-11-09817
|   |   |-- clueweb12-0502wb-11-10748
|   |   |-- clueweb12-0502wb-12-25515
|   |   |-- clueweb12-0502wb-13-32426
|   |   |-- clueweb12-0502wb-19-13507
|   |   |-- clueweb12-0502wb-22-10515
|   |   |-- clueweb12-0502wb-23-20364
|   |   |-- clueweb12-0502wb-23-20374
|   |   |-- clueweb12-0502wb-24-09989
|   |   |-- clueweb12-0502wb-24-10471
|   |   |-- clueweb12-0502wb-26-30335
|   |   |-- clueweb12-0502wb-27-22535
|   |   |-- clueweb12-0502wb-28-11340
|   |   |-- clueweb12-0502wb-28-36092
|   |   |-- clueweb12-0502wb-30-08900
|   |   |-- clueweb12-0502wb-32-04277
|   |   |-- clueweb12-0502wb-32-21133
|   |   |-- clueweb12-0502wb-32-30117
|   |   |-- clueweb12-0502wb-33-10905
|   |   |-- clueweb12-0502wb-34-07818
|   |   |-- clueweb12-0502wb-35-07067
|   |   |-- clueweb12-0502wb-35-35928
|   |   |-- clueweb12-0502wb-36-04706
|   |   |-- clueweb12-0502wb-36-20589
|   |   |-- clueweb12-0502wb-38-30139
|   |   |-- clueweb12-0502wb-39-16287
|   |   |-- clueweb12-0502wb-39-22053
|   |   |-- clueweb12-0502wb-39-24640
|   |   |-- clueweb12-0502wb-39-31620
|   |   |-- clueweb12-0502wb-42-10179
|   |   |-- clueweb12-0502wb-48-27650
|   |   |-- clueweb12-0502wb-52-07233
|   |   |-- clueweb12-0502wb-53-22806
|   |   |-- clueweb12-0502wb-53-22811
|   |   |-- clueweb12-0502wb-54-30197
|   |   |-- clueweb12-0502wb-56-04891
|   |   |-- clueweb12-0502wb-60-21143
|   |   |-- clueweb12-0502wb-60-32167
|   |   |-- clueweb12-0502wb-61-32124
|   |   |-- clueweb12-0502wb-62-18382
|   |   |-- clueweb12-0502wb-65-07212
|   |   |-- clueweb12-0502wb-65-07334
|   |   |-- clueweb12-0502wb-68-34428
|   |   |-- clueweb12-0502wb-72-29686
|   |   |-- clueweb12-0502wb-74-05238
|   |   |-- clueweb12-0502wb-78-19678
|   |   |-- clueweb12-0502wb-78-19697
|   |   |-- clueweb12-0502wb-83-02837
|   |   |-- clueweb12-0502wb-83-28640
|   |   |-- clueweb12-0502wb-85-15102
|   |   |-- clueweb12-0502wb-87-26082
|   |   |-- clueweb12-0502wb-87-29367
|   |   |-- clueweb12-0502wb-89-17873
|   |   |-- clueweb12-0502wb-90-19859
|   |   |-- clueweb12-0502wb-94-00368
|   |   |-- clueweb12-0502wb-94-30665
|   |   |-- clueweb12-0503wb-00-01032
|   |   |-- clueweb12-0503wb-00-04603
|   |   |-- clueweb12-0503wb-00-04929
|   |   |-- clueweb12-0503wb-00-06989
|   |   |-- clueweb12-0503wb-00-07839
|   |   |-- clueweb12-0503wb-00-07940
|   |   |-- clueweb12-0503wb-00-10109
|   |   |-- clueweb12-0503wb-00-10360
|   |   |-- clueweb12-0503wb-00-11874
|   |   |-- clueweb12-0503wb-00-27980
|   |   |-- clueweb12-0503wb-01-11548
|   |   |-- clueweb12-0503wb-01-22470
|   |   |-- clueweb12-0503wb-01-33875
|   |   |-- clueweb12-0503wb-03-11022
|   |   |-- clueweb12-0503wb-05-04508
|   |   |-- clueweb12-0503wb-05-29884
|   |   |-- clueweb12-0503wb-06-03758
|   |   |-- clueweb12-0503wb-06-09868
|   |   |-- clueweb12-0503wb-07-25927
|   |   |-- clueweb12-0503wb-08-14873
|   |   |-- clueweb12-0503wb-11-26383
|   |   |-- clueweb12-0503wb-11-26407
|   |   |-- clueweb12-0503wb-11-26409
|   |   |-- clueweb12-0503wb-11-32519
|   |   |-- clueweb12-0503wb-12-00731
|   |   |-- clueweb12-0503wb-12-15032
|   |   |-- clueweb12-0503wb-13-06212
|   |   |-- clueweb12-0503wb-13-18380
|   |   |-- clueweb12-0503wb-16-14284
|   |   |-- clueweb12-0503wb-16-22821
|   |   |-- clueweb12-0503wb-16-28606
|   |   |-- clueweb12-0503wb-17-12889
|   |   |-- clueweb12-0503wb-17-19581
|   |   |-- clueweb12-0503wb-18-01494
|   |   |-- clueweb12-0503wb-19-19449
|   |   |-- clueweb12-0503wb-21-23741
|   |   |-- clueweb12-0503wb-23-30843
|   |   |-- clueweb12-0503wb-23-34273
|   |   |-- clueweb12-0503wb-24-23914
|   |   |-- clueweb12-0503wb-27-18578
|   |   |-- clueweb12-0503wb-28-03992
|   |   |-- clueweb12-0503wb-28-04880
|   |   |-- clueweb12-0503wb-29-03440
|   |   |-- clueweb12-0503wb-32-03304
|   |   |-- clueweb12-0503wb-32-07315
|   |   |-- clueweb12-0503wb-34-01393
|   |   |-- clueweb12-0503wb-34-13315
|   |   |-- clueweb12-0503wb-39-03743
|   |   |-- clueweb12-0503wb-39-13394
|   |   |-- clueweb12-0503wb-39-17456
|   |   |-- clueweb12-0503wb-40-01750
|   |   |-- clueweb12-0503wb-44-06215
|   |   |-- clueweb12-0503wb-45-08115
|   |   |-- clueweb12-0503wb-45-16307
|   |   |-- clueweb12-0503wb-46-16624
|   |   |-- clueweb12-0503wb-46-17619
|   |   |-- clueweb12-0503wb-46-17631
|   |   |-- clueweb12-0503wb-46-17632
|   |   |-- clueweb12-0503wb-49-29942
|   |   |-- clueweb12-0503wb-54-06811
|   |   |-- clueweb12-0503wb-57-06075
|   |   |-- clueweb12-0503wb-57-17777
|   |   |-- clueweb12-0503wb-58-02370
|   |   |-- clueweb12-0503wb-58-12655
|   |   |-- clueweb12-0503wb-59-02552
|   |   |-- clueweb12-0503wb-61-23674
|   |   |-- clueweb12-0503wb-62-12469
|   |   |-- clueweb12-0503wb-63-23259
|   |   |-- clueweb12-0503wb-63-29106
|   |   |-- clueweb12-0503wb-64-08266
|   |   |-- clueweb12-0503wb-65-15894
|   |   |-- clueweb12-0503wb-69-01292
|   |   |-- clueweb12-0503wb-69-12537
|   |   |-- clueweb12-0503wb-69-12742
|   |   |-- clueweb12-0503wb-70-06937
|   |   |-- clueweb12-0503wb-70-07956
|   |   |-- clueweb12-0503wb-70-18257
|   |   |-- clueweb12-0503wb-71-15780
|   |   |-- clueweb12-0503wb-75-19080
|   |   |-- clueweb12-0503wb-76-28205
|   |   |-- clueweb12-0503wb-77-04955
|   |   |-- clueweb12-0503wb-77-13425
|   |   |-- clueweb12-0503wb-77-27166
|   |   |-- clueweb12-0503wb-78-21632
|   |   |-- clueweb12-0503wb-79-09330
|   |   |-- clueweb12-0503wb-80-31901
|   |   |-- clueweb12-0503wb-84-16931
|   |   |-- clueweb12-0503wb-84-16932
|   |   |-- clueweb12-0503wb-89-28933
|   |   |-- clueweb12-0503wb-91-03655
|   |   |-- clueweb12-0503wb-96-05351
|   |   |-- clueweb12-0503wb-96-11521
|   |   |-- clueweb12-0503wb-97-01049
|   |   `-- clueweb12-0503wb-97-14877
|   |-- clueweb-name2bracket.tsv
|   |-- clueweb-result-summary.tsv.csv
|   `-- clueweb.xml
|-- msnbc
|   |-- RawText
|   |   |-- 13259309
|   |   |-- 16384904
|   |   |-- 16417540
|   |   |-- 16442287
|   |   |-- 16442342
|   |   |-- 16443053
|   |   |-- 16444023
|   |   |-- 16444229
|   |   |-- 16444287
|   |   |-- 16447201
|   |   |-- 16447720
|   |   |-- 16451112
|   |   |-- 16451212
|   |   |-- 16451635
|   |   |-- 16452612
|   |   |-- 16453733
|   |   |-- 16454203
|   |   |-- 16454435
|   |   |-- 16455207
|   |   `-- 3683270
|   `-- msnbc.xml
`-- wikipedia
    |-- RawText
    |   |-- 1966#U201368_Liga_Leumit
    |   |-- 1994_Winter_Olympics_opening_ceremony
    |   |-- 1996_Big_12_Championship_Game
    |   |-- 2009_European_Pairs_Speedway_Championship
    |   |-- 2009_Superfinalen
    |   |-- 2009_Team_Speedway_Junior_European_Championship
    |   |-- 2010_Marshall_Thundering_Herd_football_team
    |   |-- 2010_NASCAR_Canadian_Tire_Series_season
    |   |-- 2011_Valencian_Community_motorcycle_Grand_Prix
    |   |-- 4769_Castalia
    |   |-- A_Trip_Down_Memory_Lane
    |   |-- Aaron_Thomas_(cricketer)
    |   |-- Abbey_Park,_Nottinghamshire
    |   |-- Alabama_State_Route_13
    |   |-- Alessandro_Gramigni
    |   |-- Alexander_MacDonnell,_3rd_Earl_of_Antrim
    |   |-- Alfred_Conkling_Coxe,_Sr.
    |   |-- Alfred_Schickel
    |   |-- Andrea_Giganti
    |   |-- Andrew_Carter_(cricketer)
    |   |-- Andrew_Hele
    |   |-- Andrew_Procter_(cricketer)
    |   |-- Andy_Flynn_(footballer)
    |   |-- Ante-chapel
    |   |-- Antonio_Rossi
    |   |-- Appollo_(dog)
    |   |-- Arnold_Townsend
    |   |-- Arthur_Keegan
    |   |-- Assembly_of_European_Wine-producing_Regions
    |   |-- Atiq-ul-Rehman
    |   |-- Augustus_Simon_Frazer
    |   |-- AutoTrack
    |   |-- BLU-109_bomb
    |   |-- Barrett_Green
    |   |-- Bastille_discography
    |   |-- Battle_of_Vila_Velha
    |   |-- Beeren_Island
    |   |-- Bering_Sea_Squadron
    |   |-- Big_Blue_River_(Indiana)
    |   |-- Bill_Schulz
    |   |-- Bioneers
    |   |-- Black_Lake_Bayou
    |   |-- Bob_Coverdale
    |   |-- Bradley_Dale_Peveto
    |   |-- Brian_Tamberlin
    |   |-- Bulgarian_Black_Sea_Coast
    |   |-- CA_Saint-#U00c9tienne_Loire_Sud_Rugby
    |   |-- Calumet_Region
    |   |-- Cave_Rock_Tunnel
    |   |-- Cecelia_Joyce
    |   |-- Central_Appalachian_pine-oak_rocky_woodland
    |   |-- Central_Lakes_State_Trail
    |   |-- Ch#U00e2teau_d'Oiron
    |   |-- Charles_Fitzgerald_(rugby)
    |   |-- Chetco_people
    |   |-- Children_in_Need_Rocks_Manchester
    |   |-- Chippenham_United_F.C.
    |   |-- Chris_Rushworth
    |   |-- Christine_(name)
    |   |-- Christopher_Andrus
    |   |-- Clara_Nordstr#U00f6m
    |   |-- Claudiopolis_(Cilicia)
    |   |-- Cody_monoplane
    |   |-- Colin_Evans_(rugby)
    |   |-- Colombo_West_Electoral_District
    |   |-- Colombophis
    |   |-- Colorado_State_Highway_94
    |   |-- Commonwealth_men
    |   |-- Confessin'
    |   |-- Country_blues
    |   |-- Cyclone_Taylor_Trophy
    |   |-- Czech_Republic_men's_national_ice_hockey_team
    |   |-- D-block_contraction
    |   |-- Daniel_Bovet
    |   |-- Daniel_Levy_(politician)
    |   |-- Darren_Shadford
    |   |-- David_West_(basketball)
    |   |-- Davit_Kubriashvili
    |   |-- Dedi_I,_Margrave_of_the_Saxon_Ostmark
    |   |-- Dennis_Pilgrim
    |   |-- Derek_Morgan_(cricketer)
    |   |-- Derrick_Schofield
    |   |-- Diabolique_(band)
    |   |-- Division_of_Port_Adelaide
    |   |-- Donald_Hogarth
    |   |-- Doug_Insole
    |   |-- Doug_Melvin_(rower)
    |   |-- Douglas_Dickinson
    |   |-- EMD_E8
    |   |-- East_Mississippi_State_Hospital
    |   |-- El_Madrid_de_los_Austrias
    |   |-- Electoral_district_of_Colton
    |   |-- Electoral_district_of_Mount_Hawthorn
    |   |-- Electoral_district_of_Murray-Darling
    |   |-- Electoral_district_of_Wembley_Beaches
    |   |-- Electoral_division_of_Apsley
    |   |-- Empower_MediaMarketing
    |   |-- Energy_in_Sudan
    |   |-- Enticho_(woreda)
    |   |-- Evelyn_Fanshawe
    |   |-- Exchange_Quay_Metrolink_station
    |   |-- Ficoll
    |   |-- Flavius_Justus
    |   |-- Florida_Gulf_Coast_Eagles_men's_basketball
    |   |-- Frances_Carpenter
    |   |-- Frank_A._Moore
    |   |-- Frank_Coombs
    |   |-- Frank_Mortimer
    |   |-- Frank_S._Pepper
    |   |-- Fred_J._Hume_Award
    |   |-- Fresia,_Chile
    |   |-- Furanocoumarin
    |   |-- G-sharp_major
    |   |-- Gabriel_Bouvery
    |   |-- Gao_Wei
    |   |-- Gemaal_Hussain
    |   |-- Gender_binary
    |   |-- Genesis_Group
    |   |-- George_Clifford_Wilson
    |   |-- George_Waddell
    |   |-- Gerwyn_Edwards
    |   |-- Giovanni_Battista_Landolina
    |   |-- Gmina_Jaraczewo
    |   |-- Gmina_Przedecz
    |   |-- Gmina_Tucz#U0119py
    |   |-- Goh_Seng_Choo_Gallery
    |   |-- Goldie_Hexagon_Racing
    |   |-- Gran_Omar
    |   |-- Greater_London_Council_election,_1970
    |   |-- Green_Lane_railway_station
    |   |-- Gregg_Brandon
    |   |-- Hagop_Sandaldjian
    |   |-- Halsey_Beshears
    |   |-- Hama_Yumi
    |   |-- Harry_Hooper_(footballer_born_1910)
    |   |-- Harry_Taylor_(rugby_league)
    |   |-- Harvard_Crimson_men's_lacrosse
    |   |-- Hassanine_Sebei
    |   |-- Heikki_Kovalainen
    |   |-- Hittin'_the_Trail_for_Hallelujah_Land
    |   |-- Hockley_Valley_Provincial_Nature_Reserve
    |   |-- Hong_Kong_Family_Welfare_Society
    |   |-- House_of_Palatinate-Birkenfeld
    |   |-- Houston_College_Classic
    |   |-- Hugh_Waddell_(rugby_union)
    |   |-- Hughie_Wilson
    |   |-- Human_image_synthesis
    |   |-- Hunterdon_Plateau
    |   |-- Iemasa_Tokugawa
    |   |-- Inland_Waterways_Authority_of_India
    |   |-- Inspectorates-General_(Turkey)
    |   |-- Interstate_691
    |   |-- Iowa's_10th_congressional_district
    |   |-- Iowa's_11th_congressional_district
    |   |-- Iowa_Highway_7
    |   |-- Jablanica_(river)
    |   |-- Jack_Kennedy_(hurler)
    |   |-- Jackie_Tyrrell
    |   |-- Jacob_de_Jager
    |   |-- Jacques_Thibaud
    |   |-- James_Barrow
    |   |-- James_Motluk
    |   |-- Jan-Erasmus_Quellinus
    |   |-- Janene_Higgins
    |   |-- Jeanne_d'#U00c9vreux
    |   |-- Jeffris_Hopkins
    |   |-- Jeremy_Davis
    |   |-- Jessica_Mauboy_discography
    |   |-- Ji#U0159#U00ed_T#U0159anovsk#U00fd
    |   |-- Jimmy_Rooney
    |   |-- John_Burton_(political_agent)
    |   |-- John_Moore_(cricketer,_born_1943)
    |   |-- John_Wertheim
    |   |-- Johnny_Moss
    |   |-- Jos#U00e9_Evangelista
    |   |-- Joseph_J._Cannon
    |   |-- Joseph_Smith_(cricketer)
    |   |-- Juan_Cruz_Ochoa
    |   |-- Juan_Cuevas_Perales
    |   |-- Judy_Roderick
    |   |-- Julian_Knowles
    |   |-- Julius_Scriver
    |   |-- June_Preisser
    |   |-- Jutta_Nardenbach
    |   |-- Katrine_Lunde_Haraldsen
    |   |-- Kazuo_Aoki
    |   |-- Kenneth_Willis_Clark_Collection
    |   |-- Kilometre_Zero_(Bucharest)
    |   |-- King_Diamond_discography
    |   |-- Krasi,_Thalassa_Kai_T'_Agori_Mou
    |   |-- Larry_Worrell
    |   |-- Laurie_Johnson_(cricketer)
    |   |-- Law_&_Order_(season_16)
    |   |-- Leading_Creek_(Ohio)
    |   |-- Leighton_Hodges
    |   |-- Live_Nation_UK
    |   |-- Love's_Welcome_at_Bolsover
    |   |-- Love_&_Life_(Mary_J._Blige_album)
    |   |-- Lubov_Egorova
    |   |-- Luc_Alphand
    |   |-- M-79_(Michigan_highway)
    |   |-- MV_Tustumena
    |   |-- Maclay_Murray_&_Spens
    |   |-- Madarihat
    |   |-- Major_League_Baseball_on_TSN
    |   |-- Malcolm_Azania
    |   |-- Malone_Area_Heritage_Museum
    |   |-- Maneer_Mirza
    |   |-- Manfred_Seissler
    |   |-- Manti_National_Forest
    |   |-- Marcel_Hirscher
    |   |-- Marcus_Marvell
    |   |-- Marcus_Thomas_Pius_Gilbert
    |   |-- Margit_Schumann
    |   |-- Margot_Leverett
    |   |-- Marillier_shot
    |   |-- Marksville_culture
    |   |-- Markus_Prock
    |   |-- Mary_O'Connor_(sportsperson)
    |   |-- Matt_Higgins_(ice_hockey)
    |   |-- Matt_Kohler
    |   |-- Maxwell_Hunter
    |   |-- May_Peterson_Thompson
    |   |-- Melville-Saltcoats
    |   |-- Men_at_Work_(season_1)
    |   |-- Messier_49
    |   |-- Michael_Youll
    |   |-- Mike_Smith_(jazz_saxophonist)
    |   |-- Mississippi_Delta_National_Heritage_Area
    |   |-- Mississippi_Hills_National_Heritage_Area
    |   |-- Mondo_2000
    |   |-- Moses_Hamon
    |   |-- Mountadam_Vineyards
    |   |-- NTFS-3G
    |   |-- Neal_Porter
    |   |-- Nebraska_Highway_11
    |   |-- Nelsonic_Industries
    |   |-- Nembrionic
    |   |-- Nether_Poppleton_Tithebarn
    |   |-- New_Manchester
    |   |-- New_York_Yankees_(1936_AFL)
    |   |-- Nidderdale_Way
    |   |-- Noel_Purcell_(water_polo)
    |   |-- Nucleoid
    |   |-- OK_Hotel
    |   |-- Omicron2_Canis_Majoris
    |   |-- Oregon_Route_10
    |   |-- Oriol_Lozano
    |   |-- Paddy_Tuimavave
    |   |-- Panhandle
    |   |-- Parkway_Center_Mall
    |   |-- Party_of_New_Forces
    |   |-- Paul_New
    |   |-- Paul_Roshier
    |   |-- Pawnee_Rangers
    |   |-- Peire_Pelet
    |   |-- Penn_State_Lady_Lions_basketball
    |   |-- Penske_PC-22
    |   |-- Peter_Rochford
    |   |-- Peter_Scott_(cricketer)
    |   |-- Petorca_Province
    |   |-- Philip_Threlfall
    |   |-- Progressive_Democratic_Party_(Tunisia)
    |   |-- Province_of_Calatayud
    |   |-- Putin's_rynda
    |   |-- R#U00edo_Verde,_Chile
    |   |-- R._F._Bayford
    |   |-- Rabbit_River
    |   |-- Rainer_Polak
    |   |-- Rally_Ireland
    |   |-- Randy_Turner
    |   |-- Rapp_Road_Community_Historic_District
    |   |-- Richard_of_Salerno
    |   |-- Rio_Grande_Association
    |   |-- River_Bride
    |   |-- Robert_Alexander_(rugby_union_and_cricket)
    |   |-- Roger_Clitheroe
    |   |-- Rogier_Koordes
    |   |-- Roland_Hyatt
    |   |-- Roman_Catholic_Diocese_of_Superior
    |   |-- Ron_Ryder
    |   |-- Roy_Vincent
    |   |-- Ruby_B._DeMesme
    |   |-- Rugby_union_in_Asia
    |   |-- Satavahana_Express
    |   |-- Scotch_and_Soda
    |   |-- Shaka_Smart
    |   |-- Sherwin_Campbell
    |   |-- Shorkot
    |   |-- Simon_Hugh_Holmes
    |   |-- Simon_L._Adler
    |   |-- Solemn_League_and_Covenant
    |   |-- Sopwith_1919_Schneider_Cup_Seaplane
    |   |-- Source_of_the_Nile_(board_game)
    |   |-- South_Carolina_Highway_200
    |   |-- South_East_Lancashire_(UK_Parliament_constituency)
    |   |-- South_Gippsland_Highway
    |   |-- Southwest_Tennessee_Development_District
    |   |-- Spectacled_Tern
    |   |-- Spondylosoma
    |   |-- St._Michael_the_Archangel_Church_(Cleveland,_Ohio)
    |   |-- Statue_of_Europe
    |   |-- Steadfastness_and_Confrontation_Front
    |   |-- Steep_Falls,_Maine
    |   |-- Stephen_Martin_(field_hockey)
    |   |-- Steve_Durbano
    |   |-- Stewart_Hutton
    |   |-- Sturgeon_House
    |   |-- Taifa_of_Badajoz
    |   |-- Taylor_Pond_Wild_Forest
    |   |-- Teddy_Holland
    |   |-- Texas_State_Highway_110
    |   |-- The_Great_White_Hope_(film)
    |   |-- Theramine
    |   |-- Thomas_Land_(Drayton_Manor)
    |   |-- Thomas_Pearsall_(cricketer)
    |   |-- Tim_Hemp
    |   |-- Todd_Wider
    |   |-- Tom_Baxter_(footballer_born_1903)
    |   |-- Tommy_Cairns
    |   |-- Tony_Blanco
    |   |-- Tony_Drake
    |   |-- Trade_Lines_(newspaper)
    |   |-- Turkey_River_(Iowa)
    |   |-- Ubaoner
    |   |-- Ulrike_Maier
    |   |-- Urla_Clashes
    |   |-- Vi_vil_oss_et_land
    |   |-- Victor_Croome
    |   |-- Vijaya_Dasa
    |   |-- W#U00fcrttemberger
    |   |-- Wesley_Brown_Field_House
    |   |-- Wheeling_Creek_(Ohio)
    |   |-- Whitesands_Bay_(Pembrokeshire)
    |   |-- Wijnand_van_der_Sanden
    |   |-- William_Carr_Lane
    |   |-- William_Wood,_1st_Baron_Hatherley
    |   |-- Wolf_Prize
    |   |-- World_Without_Superman
    |   |-- X_Corps_(Union_Army)
    |   |-- Ya'akov_Riftin
    |   |-- Yakov_Malkiel
    |   |-- Yellowback_stingaree
    |   |-- Yves_Fortier_(lawyer)
    |   `-- Zielona_G#U00f3ra_(parliamentary_constituency)
    |-- wikipedia-name2bracket.tsv
    `-- wikipedia.xml

10 directories, 802 files

@octavian-ganea
Copy link
Contributor

Yes, but you should debug yourself why that line 182 is not opening a valid file.

@titsuki
Copy link
Author

titsuki commented Feb 17, 2019

@octavian-ganea
I found that some filenames in the basic_data.zip are different from the original WNED filenames( https://www.dropbox.com/s/987hmjdoq0cql9z/WNED.tar.gz )

For example,

WNED > wned-datasets > wikipedia:

RawText/Zielona_Góra_(parliamentary_constituency)

basic_data.zip:

RawText/Zielona_G#U00f3ra_(parliamentary_constituency)

So, I replaced the basic_data.zip's ones with the original WNED ones.

After that, it passes the step. 10:

# th data_gen/gen_test_train_data/gen_all.lua -root_data_dir /root/
==> Loading redirects index	
    Done loading redirects index	
==> Loading entity wikiid - name map	
  ---> from t7 file: /root/generated/ent_name_id_map.t7	
    Done loading entity name - wikiid. Size thid index = 4306070	
==> Loading crosswikis_wikipedia from file /root/generated/crosswikis_wikipedia_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
==> Loading yago index from file /root/generated/yago_p_e_m.txt	
Processed 2000000 lines. 	
Processed 4000000 lines. 	
Processed 6000000 lines. 	
Processed 8000000 lines. 	
Processed 10000000 lines. 	
Processed 12000000 lines. 	
    Done loading index	

Generating test data from AIDA set 	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Iván García not found. Redirects file needs to be loaded for better performance.	
Entity Akhbar not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Oksana Grishina not found. Redirects file needs to be loaded for better performance.	
Entity Craig Brown not found. Redirects file needs to be loaded for better performance.	
Entity John Collins not found. Redirects file needs to be loaded for better performance.	
Entity International Boxing Association not found. Redirects file needs to be loaded for better performance.	
Entity Ramón Ramírez not found. Redirects file needs to be loaded for better performance.	
Done validation testA : 	
num_nme = 1126; num_nonexistent_ent_title = 3189	
num_nonexistent_ent_id = 0; num_nonexistent_both = 35	
num_correct_ents = 1567; num_total_ents = 4791	
Entity World Open not found. Redirects file needs to be loaded for better performance.	
Entity Douglas Young not found. Redirects file needs to be loaded for better performance.	
Entity Douglas Young not found. Redirects file needs to be loaded for better performance.	
Entity James Love not found. Redirects file needs to be loaded for better performance.	
Entity Noel Whelan not found. Redirects file needs to be loaded for better performance.	
    Done AIDA.	
num_nme = 2257; num_nonexistent_ent_title = 6255	
num_nonexistent_ent_id = 0; num_nonexistent_both = 72	
num_correct_ents = 2949; num_total_ents = 9276	

Generating train data from AIDA set 	
Entity Craig Brown not found. Redirects file needs to be loaded for better performance.	
Entity International cricketers of South African origin not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Carlos Costa not found. Redirects file needs to be loaded for better performance.	
Entity Antonio Esposito not found. Redirects file needs to be loaded for better performance.	
Entity Independence Day (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Independence Day (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Erik Hanson not found. Redirects file needs to be loaded for better performance.	
Entity Erik Hanson not found. Redirects file needs to be loaded for better performance.	
Entity Iván García not found. Redirects file needs to be loaded for better performance.	
Entity Camelot, Chesapeake, Virginia not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Gordon Parsons not found. Redirects file needs to be loaded for better performance.	
Entity Xhosa not found. Redirects file needs to be loaded for better performance.	
Entity Xhosa not found. Redirects file needs to be loaded for better performance.	
Entity Jamaat-e-Islami not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Franz Konrad not found. Redirects file needs to be loaded for better performance.	
Entity Ford Escort not found. Redirects file needs to be loaded for better performance.	
Entity Carlos Costa not found. Redirects file needs to be loaded for better performance.	
Entity Craig Evans not found. Redirects file needs to be loaded for better performance.	
Entity Preston not found. Redirects file needs to be loaded for better performance.	
Entity Superman (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Superman (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Entity Jonathan Stark not found. Redirects file needs to be loaded for better performance.	
Entity Ashta not found. Redirects file needs to be loaded for better performance.	
Entity John Smiley not found. Redirects file needs to be loaded for better performance.	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Michael Andersson not found. Redirects file needs to be loaded for better performance.	
Entity Oksana Grishina not found. Redirects file needs to be loaded for better performance.	
Entity Derek Ryan not found. Redirects file needs to be loaded for better performance.	
Entity Bandundu not found. Redirects file needs to be loaded for better performance.	
Entity Čelopek not found. Redirects file needs to be loaded for better performance.	
    Done AIDA.	
num_nme = 4855; num_nonexistent_ent_title = 12103	
num_nonexistent_ent_id = 0; num_nonexistent_both = 236	
num_correct_ents = 6202; num_total_ents = 18541	

Generating test data from wikipedia set 	
Entity Christina (given name) not found. Redirects file needs to be loaded for better performance.	
Christina (given name)	
Entity Christina (given name) not found. Redirects file needs to be loaded for better performance.	
Christina (given name)	
Entity Kirsten not found. Redirects file needs to be loaded for better performance.	
Kirsten	
Entity Leslie Townsend not found. Redirects file needs to be loaded for better performance.	
Leslie Townsend	
Entity Leslie Townsend not found. Redirects file needs to be loaded for better performance.	
Leslie Townsend	
Entity U.S. Route 40 not found. Redirects file needs to be loaded for better performance.	
U.S. Route 40	
Entity Ashland, Louisiana not found. Redirects file needs to be loaded for better performance.	
Ashland, Louisiana	
Entity List of Farm to Market Roads in Texas (1–99) not found. Redirects file needs to be loaded for better performance.	
Farm to Market Road 16	
Entity List of Farm to Market Roads in Texas (1–99) not found. Redirects file needs to be loaded for better performance.	
Farm to Market Road 17	
Entity List of Farm to Market Roads in Texas (1–99) not found. Redirects file needs to be loaded for better performance.	
Farm to Market Road 17	
Entity List of state highways in Colorado not found. Redirects file needs to be loaded for better performance.	
State highways in Colorado	
Entity U.S. Route 40 not found. Redirects file needs to be loaded for better performance.	
U.S. Route 40	
Entity U.S. Route 40 not found. Redirects file needs to be loaded for better performance.	
U.S. Route 40	
Entity Thomas Gordon not found. Redirects file needs to be loaded for better performance.	
Thomas Gordon	
Entity Robert Crowley not found. Redirects file needs to be loaded for better performance.	
Robert Crowley	
Entity Tiger (comics) not found. Redirects file needs to be loaded for better performance.	
Tiger (comics)	
Entity Tiger (comics) not found. Redirects file needs to be loaded for better performance.	
Tiger (comics)	
Entity Underdog not found. Redirects file needs to be loaded for better performance.	
Underdogs	
Entity John Tyrrell not found. Redirects file needs to be loaded for better performance.	
John Tyrrell	
Entity List of minor DC Comics characters not found. Redirects file needs to be loaded for better performance.	
Sam Lane (comics)	
Entity Ryan Campbell not found. Redirects file needs to be loaded for better performance.	
Ryan Campbell	
Done wikipedia.	
num_nonexistent_ent_id = 21; num_correct_ents = 6800	

Generating test data from clueweb set 	
Entity Mia Jones (Degrassi: The Next Generation) not found. Redirects file needs to be loaded for better performance.	
Mia Jones (Degrassi: The Next Generation)	
Entity World not found. Redirects file needs to be loaded for better performance.	
World	
Entity The Lord of the Rings: The Fellowship of the Ring not found. Redirects file needs to be loaded for better performance.	
The Lord of the Rings: The Fellowship of the Ring	
Entity Anthrax (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Anthrax (band)	
Entity Anthrax (disambiguation) not found. Redirects file needs to be loaded for better performance.	
Anthrax (band)	
Entity World not found. Redirects file needs to be loaded for better performance.	
World	
Entity Cosmos: A Personal Voyage not found. Redirects file needs to be loaded for better performance.	
Cosmos: A Personal Voyage	
Entity Jumeirah Village not found. Redirects file needs to be loaded for better performance.	
Jumeirah Village	
Done clueweb.	
num_nonexistent_ent_id = 8; num_correct_ents = 11146	

Generating test data from ace2004 set 	
Entity Lujaizui not found. Redirects file needs to be loaded for better performance.	
Lujaizui	
Done ace2004.	
num_nonexistent_ent_id = 1; num_correct_ents = 256	

Generating test data from msnbc set 	
Done msnbc.	
num_nonexistent_ent_id = 0; num_correct_ents = 656	

Generating test data from aquaint set 	
Entity List of radio stations in Nicaragua not found. Redirects file needs to be loaded for better performance.	
List of radio stations in Nicaragua	
Entity List of newspapers in India not found. Redirects file needs to be loaded for better performance.	
List of newspapers in India	
Entity List of fatal bear attacks in North America by decade not found. Redirects file needs to be loaded for better performance.	
List of fatal bear attacks in North America by decade	
Entity Federated State not found. Redirects file needs to be loaded for better performance.	
Federated State	
Entity List of national legal systems not found. Redirects file needs to be loaded for better performance.	
List of national legal systems	
Entity Tender not found. Redirects file needs to be loaded for better performance.	
Tender	
Entity David Richardson not found. Redirects file needs to be loaded for better performance.	
David Richardson	
Done aquaint.	
num_nonexistent_ent_id = 7; num_correct_ents = 720

Is this expected workaround? If basic_data.zip is corrupted then why the other people could pass the step 10?

@titsuki
Copy link
Author

titsuki commented Feb 17, 2019

I also confirmed the stats are correct.

stats.sh:

cat wned-ace2004.csv |  wc -l
#257
cat wned-ace2004.csv |  grep -P 'GT:\t-1' | wc -l
#20
cat wned-ace2004.csv | grep -P 'GT:\t1,' | wc -l
#217

cat wned-aquaint.csv |  wc -l
#727
cat wned-aquaint.csv |  grep -P 'GT:\t-1' | wc -l
#33
cat wned-aquaint.csv | grep -P 'GT:\t1,' | wc -l
#604

cat wned-msnbc.csv  | wc -l
#656
cat wned-msnbc.csv |  grep -P 'GT:\t-1' | wc -l
#22
cat wned-msnbc.csv | grep -P 'GT:\t1,' | wc -l
#496
# bash stats.sh 
257
20
217
727
33
604
656
22
496

@titsuki
Copy link
Author

titsuki commented Mar 2, 2019

@octavian-ganea
Thanks for your response.
I could finish through step.17 by this workaround.
So I'll close this issue.

Note

My environment was as follows:

  • lua: 5.1
  • Docker image: nvidia/cuda:9.2-devel
  • cudnn:
    • cudnn-9.2-linux-x64-v7.2.1.38 (downloaded from nvidia)
    • cudnn.torch R7 (lua module)

Moreover, I also encountered the issue 17 ( #17 ) and deleted the assertions.

@titsuki titsuki closed this as completed Mar 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants