自然语言处理NLP星空智能对话机器人系列:Facebook StarSpace框架案例数据加载
Facebook StarSpace 案例脚本
先看一下Facebook StarSpace官方源码中提供的一个示例代码 classification_ag_news.sh脚本文件:
myshuf() {
perl -MList::Util=shuffle -e 'print shuffle(<>);' "$@";
}
normalize_text() {
tr '[:upper:]' '[:lower:]' | sed -e 's/^/__label__/g' | \
sed -e "s/'/ ' /g" -e 's/"//g' -e 's/\./ \. /g' -e 's/<br \/>/ /g' \
-e 's/,/ , /g' -e 's/(/ ( /g' -e 's/)/ ) /g' -e 's/\!/ \! /g' \
-e 's/\?/ \? /g' -e 's/\;/ /g' -e 's/\:/ /g' | tr -s " " | myshuf
}
DATASET=(
ag_news
)
MODELDIR=/tmp/starspace/models
DATADIR=/tmp/starspace/data
mkdir -p "${MODELDIR}"
mkdir -p "${DATADIR}"
echo "Downloading dataset ag_news"
if [ ! -f "${DATADIR}/${DATASET[i]}.train" ]
then
wget -c "https://dl.fbaipublicfiles.com/starspace/ag_news_csv.tar.gz" -O "${DATADIR}/${DATASET[0]}_csv.tar.gz"
tar -xzvf "${DATADIR}/${DATASET[0]}_csv.tar.gz" -C "${DATADIR}"
cat "${DATADIR}/${DATASET[0]}_csv/train.csv" | normalize_text > "${DATADIR}/${DATASET[0]}.train"
cat "${DATADIR}/${DATASET[0]}_csv/test.csv" | normalize_text > "${DATADIR}/${DATASET[0]}.test"
fi
echo "Compiling StarSpace"
make
echo "Start to train on ag_news data:"
./starspace train \
-trainFile "${DATADIR}"/ag_news.train \
-model "${MODELDIR}"/ag_news \
-initRandSd 0.01 \
-adagrad false \
-ngrams 1 \
-lr 0.01 \
-epoch 5 \
-thread 20 \
-dim 10 \
-negSearchLimit 5 \
-trainMode 0 \
-label "__label__" \
-similarity "dot" \
-verbose true
echo "Start to evaluate trained model:"
./starspace test \
-model "${MODELDIR}"/ag_news \
-testFile "${DATADIR}"/ag_news.test \
-ngrams 1 \
-dim 10 \
-label "__label__" \
-thread 10 \
-similarity "dot" \
-trainMode 0 \
-verbose true
classification_ag_news.sh脚本包括以下内容:
wget -c "https://dl.fbaipublicfiles.com/starspace/ag_news_csv.tar.gz" -O "${DATADIR}/${DATASET[0]}_csv.tar.gz"
tar -xzvf "${DATADIR}/${DATASET[0]}_csv.tar.gz" -C "${DATADIR}"
cat "${DATADIR}/${DATASET[0]}_csv/train.csv" | normalize_text > "${DATADIR}/${DATASET[0]}.train"
cat "${DATADIR}/${DATASET[0]}_csv/test.csv" | normalize_text > "${DATADIR}/${DATASET[0]}.test"
./starspace train \
-trainFile "${DATADIR}"/ag_news.train \
-model "${MODELDIR}"/ag_news \
-initRandSd 0.01 \
-adagrad false \
-ngrams 1 \
-lr 0.01 \
-epoch 5 \
-thread 20 \
-dim 10 \
-negSearchLimit 5 \
-trainMode 0 \
-label "__label__" \
-similarity "dot" \
-verbose true
echo "Start to evaluate trained model:"
./starspace test \
-model "${MODELDIR}"/ag_news \
-testFile "${DATADIR}"/ag_news.test \
-ngrams 1 \
-dim 10 \
-label "__label__" \
-thread 10 \
-similarity "dot" \
-trainMode 0 \
-verbose true
脚本运行结果如下:
aistudio@jupyter-112853-2339160:~/Starspace$ bash examples/classification_ag_news.sh
Downloading dataset ag_news
--2021-09-05 12:11:53-- https://dl.fbaipublicfiles.com/starspace/ag_news_csv.tar.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.74.142, 104.22.75.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.74.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11784327 (11M) [application/gzip]
Saving to: ‘/tmp/starspace/data/ag_news_csv.tar.gz’
/tmp/starspace/data/ag_news_csv.tar.g 100%[========================================================================>] 11.24M 1.63MB/s in 9.3s
2021-09-05 12:12:04 (1.21 MB/s) - ‘/tmp/starspace/data/ag_news_csv.tar.gz’ saved [11784327/11784327]
ag_news_csv/
ag_news_csv/train.csv
ag_news_csv/test.csv
ag_news_csv/classes.txt
ag_news_csv/readme.txt
Compiling StarSpace
make: Nothing to be done for 'opt'.
Start to train on ag_news data:
Arguments:
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
validationPatience: 10
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: dot
maxNegSamples: 10
negSearchLimit: 5
batchSize: 5
thread: 20
minCount: 1
minCountLabel: 1
label: __label__
label: __label__
ngrams: 1
bucket: 2000000
adagrad: 0
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
useWeight: 0
weightSep: :
Start to initialize starspace model.
Build dict from input file : /tmp/starspace/data/ag_news.train
Read 5M words
Number of words in dictionary: 95811
Number of labels in dictionary: 4
Loading data from file : /tmp/starspace/data/ag_news.train
Total number of examples loaded : 120000
Initialized model weights. Model size :
matrix : 95815 10
Training epoch 0: 0.01 0.002
Epoch: 100.0% lr: 0.008167 loss: 0.039927 eta: <1min tot: 0h0m2s (20.0%)
---+++ Epoch 0 Train error : 0.03663562 +++--- ?
Training epoch 1: 0.008 0.002
Epoch: 100.0% lr: 0.006033 loss: 0.018411 eta: <1min tot: 0h0m5s (40.0%)
---+++ Epoch 1 Train error : 0.01966528 +++--- ?
Training epoch 2: 0.006 0.002
Epoch: 100.0% lr: 0.004017 loss: 0.016100 eta: <1min tot: 0h0m7s (60.0%)
---+++ Epoch 2 Train error : 0.01621233 +++--- ?
Training epoch 3: 0.004 0.002
Epoch: 100.0% lr: 0.002050 loss: 0.015269 eta: <1min tot: 0h0m10s (80.0%)
---+++ Epoch 3 Train error : 0.01403413 +++--- ?
Training epoch 4: 0.002 0.002
Epoch: 100.0% lr: 0.000017 loss: 0.012657 eta: <1min tot: 0h0m12s (100.0%)
---+++ Epoch 4 Train error : 0.01240537 +++--- ?
Saving model to file : /tmp/starspace/models/ag_news
Saving model in tsv format : /tmp/starspace/models/ag_news.tsv
Start to evaluate trained model:
Arguments:
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
validationPatience: 10
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: dot
maxNegSamples: 10
negSearchLimit: 50
batchSize: 5
thread: 10
minCount: 1
minCountLabel: 1
label: __label__
label: __label__
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
useWeight: 0
weightSep: :
Start to load a trained starspace model.
STARSPACE-2018-2
Initialized model weights. Model size :
matrix : 95815 10
Model loaded.
Loading data from file : /tmp/starspace/data/ag_news.test
Total number of examples loaded : 7600
Predictions use 4 known labels.
------Loaded model args:
Arguments:
lr: 0.01
dim: 10
epoch: 5
maxTrainTime: 8640000
validationPatience: 10
saveEveryEpoch: 0
loss: hinge
margin: 0.05
similarity: dot
maxNegSamples: 10
negSearchLimit: 5
batchSize: 5
thread: 10
minCount: 1
minCountLabel: 1
label: __label__
label: __label__
ngrams: 1
bucket: 2000000
adagrad: 1
trainMode: 0
fileFormat: fastText
normalizeText: 0
dropoutLHS: 0
dropoutRHS: 0
useWeight: 0
weightSep: :
Predictions use 4 known labels.
Evaluation Metrics :
hit@1: 0.464605 hit@10: 1 hit@20: 1 hit@50: 1 mean ranks : 1.69842 Total examples : 7600
aistudio@jupyter-112853-2339160:~/Starspace$
Facebook StarSpace 案例数据
AG 新闻主题分类数据集简介
AG的新闻主题分类数据集 版本3,于2015年9月9日更新
-
AG收集了100多万篇新闻文章。Cometmyhead在一年多的活动中从2000多个新闻来源收集了新闻文章。CometMyhead是一个学术新闻搜索引擎,自2004年7月开始运行。该数据集由学术联盟提供,用于数据挖掘(聚类、分类等)、信息检索(排名、搜索等)、xml、数据压缩、数据流和任何其他非商业活动的研究目的。有关更多信息,请参阅链接http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html。 -
AG的新闻主题分类数据集由张翔(Xiang Zhang)构建。zhang@nyu.edu)从上面的数据集。本文将其作为文本分类的基准:张翔、赵俊波、杨乐村。用于文本分类的字符级卷积网络。神经信息处理系统的进展28(NIPS 2015)。 -
AG的新闻主题分类数据集是通过从原始语料库中选择4个最大的类来构建的。每个分类包含30000个训练样本和1900个测试样本。训练样本总数为120000,测试样本总数为7600。 -
文件classes.txt包含与每个标签对应的类列表。 -
文件train.csv和test.csv包含所有以逗号分隔的训练样本。其中有3列,对应于类索引(1到4)、标题和说明。标题和 说明用双引号(“”)转义,任何内部双引号用两个双引号(“”)转义。新行用反斜杠转义,后跟“n”字符,即“\n”。
标签类别文件
文件classes.txt包含与每个标签对应的类列表,格式如下
aistudio@jupyter-112853-2339160:/tmp/starspace/data/ag_news_csv$ cat classes.txt
World
Sports
Business
Sci/Tech
训练数据文件
train.csv 的文件格式如下,其中有3列,对应于类索引(1到4)、标题和说明
aistudio@jupyter-112853-2339160:/tmp/starspace/data/ag_news_csv$ head -10 train.csv
"3","Wall St. Bears Claw Back Into the Black (Reuters)","Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again."
"3","Carlyle Looks Toward Commercial Aerospace (Reuters)","Reuters - Private investment firm Carlyle Group,\which has a reputation for making well-timed and occasionally\controversial plays in the defense industry, has quietly placed\its bets on another part of the market."
"3","Oil and Economy Cloud Stocks' Outlook (Reuters)","Reuters - Soaring crude prices plus worries\about the economy and the outlook for earnings are expected to\hang over the stock market next week during the depth of the\summer doldrums."
"3","Iraq Halts Oil Exports from Main Southern Pipeline (Reuters)","Reuters - Authorities have halted oil export\flows from the main pipeline in southern Iraq after\intelligence showed a rebel militia could strike\infrastructure, an oil official said on Saturday."
"3","Oil prices soar to all-time record, posing new menace to US economy (AFP)","AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections."
"3","Stocks End Up, But Near Year Lows (Reuters)","Reuters - Stocks ended slightly higher on Friday\but stayed near lows for the year as oil prices surged past #36;46\a barrel, offsetting a positive outlook from computer maker\Dell Inc. (DELL.O)"
"3","Money Funds Fell in Latest Week (AP)","AP - Assets of the nation's retail money market mutual funds fell by #36;1.17 billion in the latest week to #36;849.98 trillion, the Investment Company Institute said Thursday."
"3","Fed minutes show dissent over inflation (USATODAY.com)","USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump."
"3","Safety Net (Forbes.com)","Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, ""buying insurance was the furthest thing from my mind,"" says Riley."
"3","Wall St. Bears Claw Back Into the Black"," NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling band of ultra-cynics, are seeing green again."
转换以后的文件格式ag_news.train
aistudio@jupyter-112853-2339160:~/Starspace$ head -100 /tmp/starspace/data/ag_news.train
__label__3 , tommy hilfiger buys lagerfeld trademarks , tommy hilfiger corp . - whose once-hot preppy clothing business has cooled - is making a play to rekindle growth by buying luxury brand karl lagerfeld .
__label__1 , alleged coup leader mann
__label__3 , jury rules 9/11 was two attacks , attaching victory to a string of defeats , a jury yesterday agreed with world trade center developer larry silverstein
__label__2 , lions
__label__1 , mugabe higher than tsvangirai in zimbabwe , ( cpod ) aug . 26 , 2004 - adults in zimbabwe are divided over the performance of robert mugabe , according to the afrobarometer conducted by the institute for democracy in south africa , ghanas centre for
__label__3 , a tenuous hold on middle class , even as african americans and other minorities have made economic progress in the last 40 years , many of those reaching the middle-income rung are finding it a hollow promise .
__label__2 , testing times for owen , striker michael owen is facing a battle to prove his form and fitness as the pressure grew on his starting place for englands world cup qualifier against wales .
__label__3 , hk shares end near 45-mo high on hutchison , ppty stks -2- , hong kong ( dow jones ) --hong kong shares ended up monday to close at their highest level in nearly 45 months , led by gains in property counters and blue-chip hutchison whampoa .
__label__4 , paleontologists put ancient long-necked monster in its place , iraffes evolved long necks to browse in trees high above the competition . that is as plain as the explanation the wolf in granny clothing gave little red riding hood for his big teeth quot the better to eat you with , my dear .
__label__4 , space radiation may harm astronauts ' blood cells , in the time it takes you to read this sentence , more than 10 million red blood cells in your body will die . don ' t be alarmed it ' s natural , and stem cells in your bone marrow are constantly making enough new cells to replace the dying ones . but what if those blood-making cells stopped working ? this could be a concern for astronauts taking long trips beyond earth orbit .
__label__3 , world crude oil market slips as supply fears ease , new york ( afp ) - world crude oil prices slipped , halting a price spike triggered by fears of tight energy supplies in the northeastern united states .
__label__3 , new york ' s spitzer says may sue insurer ( reuters ) , reuters - new york attorney general eliot\spitzer will file a lawsuit as early as friday against an\insurance company , the second suit stemming from his sweeping\probe into bid-rigging in the industry , spitzer told reuters on\friday .
__label__4 , a new crew arrives at the space station , a soyuz spacecraft with a replacement crew for the international space station docked safely with the orbiting complex early saturday .
__label__1 , sudanese aid workers released by rebels ( ap ) , ap - rebels released six sudanese aid workers early wednesday in darfur , four days after they went missing during a trip to register refugees fleeing a brutal campaign of killings in the war-ravaged region , a u . n . official said .
__label__1 , blair pledges to
__label__4 , group new england not reducing mercury ( ap ) , ap - if environmental policy were scored like grade-school spelling tests , new england ' s mercury reduction efforts wouldn ' t get any gold stars .
__label__2 , los angeles to bid for summer olympics , after hosting the summer olympics in 1932 and 1984 , the city of angels may submit a bid for the games . the city council unanimously endorsed a proposal wednesday to vie for the summer games as early as 2016 .
__label__1 , nigeria delta rebels agree truce , the nigerian government has confirmed reports that rebel groups in the country
__label__3 , asian stock markets close mixed , asian stock markets closed mixed monday , with the key indexes dipping both in hong kong and singapore . japanese financial markets were closed for a national holiday .
__label__1 , us airline starts vietnam flights , the us resumes commercial flights to vietnam , 30 years after the last flight at the end of the war .
__label__4 , intel introduces new itanium , but microsoft skips it ( newsfactor ) , newsfactor - intel has introduced a third-generation itanium 2 into the market , but microsoft ( nasdaq msft ) has noted that its windows server 2003 compute cluster edition will not run on servers with the chip .
__label__2 , sexson agrees to \$50 million contract with seattle , the seattle mariners added power to one of the weakest lineups in baseball wednesday , agreeing to a \$50 million , four-year contract with free agent first baseman richie sexson .
__label__3 , circuit city cuts quarterly loss , stock up , new york ( reuters ) - circuit city stores inc . < a href=http //www . investor . reuters . com/fullquote . aspx ? ticker=cc . n target=/stocks/quickinfo/fullquote> cc . n< /a> on friday reported a narrower quarterly loss as it kept a tight rein on expenses and sales of digital televisions and notebook computers rose , sending its stock up 5 percent .
__label__2 , juve restore five-point lead , juventus maintained their unbeaten run and restored their five-point serie a lead with this hard-fought win today . alessandro del piero opened the scoring with a low drive just after the half-hour and marcelo
__label__2 , woods eyes fourth title after singh ' s withdrawal , london ( reuters ) - three-times winner tiger woods bids for a third successive title at this week ' s wgc-american express championship in kilkenny , ireland .
__label__1 , corruption costs 200bn a year , corruption costs businesses and governments more than 220 billion a year , with a number of oil producing states among the worst offenders , according to an international anti-graft watchdog .
__label__3 , canada cuts unemployment premium by smallest amount since 1995 , canada cut the premium it plans to charge workers and companies for unemployment insurance in 2005 by 7 cents , the smallest reduction since the government started lowering the rates a decade ago .
__label__3 , yukos files for bankruptcy protection in bush
__label__1 , kofi annan globally popular mainstay of the un , having dedicated more than 40 years to the united nations , the seventh secretary general kofi annan is a well-liked and admired global figure .
__label__2 , astros
__label__1 , three die in suicide bombing , a car suicide bomber attacked an american convoy in the north iraqi city of mosul yesterday , killing one us soldier and twoiraqis .
__label__4 , taking microsoft for a spin ? , the software juggernaut that conquered the desktop is racing to get windows into your next car .
__label__3 , logan may benefit from fewer o
__label__2 , nets 88 raptors 86 , east rutherford , nj jason kidd scored ten points in his season debut to help the new jersey nets beat the toronto raptors 88-to-86 .
__label__3 , reuters shares up on instinet sale rumors , london -- shares in reuters group plc rose thursday on speculation that its part-owned stock-trading network instinet group inc . is up for sale .
__label__2 , wrapup 1-feyenoord , steaua reach last 32 in uefa cup , feyenoord won the battle of the former winners to reach the last 32 of the uefa cup on wednesday after a 2-1 victory over schalke 04 .
__label__4 , icann domain transfer policy takes effect , november 12 , 2004 ( idg news service ) - a new transfer policy for inter-registrar domain names went into effect today , according to the internet corporation for assigned names and numbers ( icann ) .
__label__3 , higher oil prices prompt downward revision to
__label__2 , steelers 13 , dolphins 3 , the rookie overcame a slow start , remnants of hurricane jeanne and the miami dolphins
__label__3 , us army to withhold portion of halliburton payments , the billing dispute between the us army and houston-based halliburton co . continues as the army on tuesday said it would withhold paying 15 percent of future invoices from halliburton .
__label__1 , house , senate agree on corporate tax bill ( reuters ) , reuters - u . s . senate and house of\representatives negotiators agreed on wednesday on a huge\corporate tax bill that will repeal export subsidies that\violate global trade rules and give manufacturers a new tax\break .
__label__3 , harvard ' s\$12 billion man , harvard university ' s \$12 billion man doesn ' t wear a tie , takes the subway to work , and eats his lunch in the cafeteria on the fourth floor of the federal reserve building . jack meyer is a person of few pretensions , but strong beliefs .
__label__1 , un peacekeepers rush to storm-ravaged gonaives to stop looting , un peacekeepers rushed to this storm-ravaged city monday to guard against looters stealing food aid while military doctors performed operations on howling patients and hundreds of weary victims , after a miserable night spent out in
__label__2 , bentley comes home just ahead of ne-10 crowd , the northeast-10 conference race is entering the final third of the season , and the four teams atop the standings are separated by just a half-game .
__label__4 , mandrakesoft in bid for eal5 certification , linux vendor mandrakesoft is teaming with a consortium of european partners in an effort to win common criteria evaluation assurance level 5 ( eal5 ) , the highest security certification for defense and other highly sensitive areas of governmental it
__label__2 , giants safety to have knee surgery ( ap ) , ap - new york giants safety shaun williams will have surgery on his left knee monday to repair damaged cartilage .
__label__3 , qwest to pay \$250 mln to settle sec fraud charges ( update2 ) , qwest communications international inc . , the fourth-largest us local-telephone provider , agreed to pay \$250 million to settle us securities and exchange commission allegations that it
__label__3 , yukos files for bankruptcy in u . s . , the yukos oil company has filed for bankruptcy in the united states and appealed for a temporary restraining order against the auction of its main production unit that is scheduled for sunday - dramatically challenging the russian government to enter arbitration proceedings .
__label__2 , blackburn set to sign french world cup winner youri djorkaeff , blackburn rovers are poised to sign french world cup winner youri djorkaeff , who was released by bolton in the close-season , the english premiership soccer club said .
__label__1 , pringle to risk zim deportation , journalist derek pringle will refuse to sign a declaration that commits him to cover only cricket in zimbabwe .
__label__3 , yukos seeks us bankruptcy , russian oil major yukos has filed for bankruptcy protection in a us court and will seek an injunction to stop russia from auctioning off its main production unit on dec . 19 , it said on wednesday .
__label__3 , stocks fall as china raises rates , new york ( reuters ) - u . s . stocks fell on thursday morning after china ' s central bank said it was raising interest rates for the first time in nine years , fueling concerns that global economic growth may slow .
__label__3 , airline alliance downed in court , aviation industry analysts expect qantas to head in a different direction now the air nz alliance is off . picture reuters . qantas
__label__3 , eu not ready to lift sanctions against us , there will be no early end to punitive european tariffs on us products despite a decision by the us congress to end a corporate tax subsidy ruled illegal by the world trade organisation ( wto ) pascal lamy , the european unions chief trade negotiator
__label__1 , congo says its troops are fighting rwandan forces ( reuters ) , reuters - congo ' s government spokesman denied\tuesday that rival army factions were fighting each other in\the east of the country and insisted the clashes were between\congolese and rwandan forces .
__label__3 , us air asks court to end labor contracts , us airways asked a bankruptcy court yesterday to throw out contracts covering passenger service agents , flight attendants , mechanics and other workers and replace them with less-expensive ones .
__label__4 , tsmc to tape out 40 more products at 90nm by year-end , taiwan semiconductor manufacturing company ( tsmc ) has already produced over 80 products using 90nm processes , and the company expects to tape out 40 more products at this technology node by the end of this year , according to genda hu , vice president of
__label__1 , the threat at home , european elites , like american elites , are having trouble understanding the recent american elections . how can 59 , 054 , 087 people be so dumb ?
__label__4 , sony , toshiba and ibm prepare to reveal cell details , the three partners involved in the development of the cell processor , which will power the playstation 3 along with a number of other devices , have unveiled new details about the technology behind the chip .
__label__2 , simms impressive in bucs ' preseason debut ( ap ) , ap - jon gruden can understand why tampa bay fans are talking about chris simms .
__label__1 , megawati defends achievements ahead of vote , jakarta ( reuters ) - indonesian president megawati sukarnoputri , trying to maintain momentum in a tight election battle , said monday her government had stabilized the economy and cracked down hard on militants and separatists .
__label__4 , san francisco giants to offer wi-fi instant replays , san francisco ( ap ) -- peanuts , hot dogs and wireless instant replays . it ' s the future of baseball . . .
__label__4 , letting the internet knock on the door , residents wanting to meet their neighbors are doing so electronically through the web site meettheneighbors . org .
__label__1 , oil prices drop \$1 despite iraq sabotage , washington - oil futures dropped by nearly \$1 per barrel monday despite pipeline sabotage in iraq that has delayed exports from a southern port - reinforcing the view among traders that prices had risen too fast earlier this summer . it just goes to show you that when the psychology turns , it turns , said tom bentz , a trader at bnp paribas futures in new york . . .
__label__3 , more than 300 , 000 vie for 3 , 000 la port jobs , more than 300 , 000 people participated in a lottery on thursday for 3 , 000 well-paying longshore jobs at the ports of los angeles and long beach , where shipping volumes are booming , a pacific maritime association spokesman said .
__label__3 , bank calls for argentine reforms , the world bank approves \$200m for infrastructure projects in argentina , but demands that the country complete restructuring its defaulted debt .
__label__1 , canada to boost defence and security commitments during bush visit ( canadian press ) , canadian press - ottawa ( cp ) - canada may eventually agree to send soldiers to train iraqi military officers but it won ' t make any commitments when u . s . president george w . bush visits the capital on tuesday .
__label__3 , australian bank group reports record profit , sydney australia amp new zealand banking group said tuesday that second-half profit rose 17 percent to a record after it bought new zealand
__label__4 , flowering phone is environmental wake-up call , a rose is a rose is a phone ? british scientists seeking to protect the environment have designed a biodegradable mobile phone cover that breaks down in soil when discarded and sprouts
__label__1 , ritter family files wrongful death suit , los angeles - the family of john ritter has sued a burbank hospital , accusing it of negligence in the death of the 54-year-old actor . ritter ' s wife amy yasbeck and his four children , including actor jason ritter , filed the wrongful death suit in los angeles county superior court on sept . . .
__label__3 , us official backs producers on china apparel curbs , hong kong brushing aside chinese warnings of a possible challenge at the world trade organization , a senior us trade official said here on friday that the united states would limit apparel imports if american manufacturers provided evidence that such
__label__3 , at hsbc , eldon plans departure , hsbc holdings , europe
__label__3 , google cuts ipo price by a quarter to \$85-\$95 a share ( update7 ) , aug . 18 ( bloomberg ) -- google inc . , on the eve of pricing the second-biggest internet initial public offering , slashed the value of the sale almost in half amid the worst market for us ipos in almost two years .
__label__3 , salvation army rings in the holidays , or so hopes the salvation army . each year volunteers and employees appear in malls across broward county to ring bells from early november to christmas eve .
__label__2 , webb two off sorenstam
__label__4 , lycos europe anti-spam screensaver bites the dust ! , although close on 100 , 000 people downloaded the free screensaver , it received a great deal of criticism . many believed it lowered itself to the tactics of the spammers themselves and could have caused more problems than it solved .
__label__4 , ' killer apps ' demand computer upgrades , this is turning out to be the year of the upgrade for many computer gamers , thanks to the arrival of blockbuster game titles such as doom 3 and half-life 2 , on its way to retail shelves after a year ' s worth of delays . < font face=verdana , ms sans serif , arial , helvetica size=-2\ color=
__label__4 , quickbooks 2005 makes mac-pc transfer easier , more ( maccentral ) , maccentral - later this week intuit inc . will officially announce quickbooks pro 2005 for mac , the latest release of its small business accounting software -- although the company has already posted details of the new release to its web site . major changes in this new release include integrated ical support , the ability to back up data files to . mac , easier file sharing with the windows version of quickbooks and the ability to add pdf backgrounds to forms .
__label__4 , coral reefs may grow with global warming , rising levels of greenhouse gases may not be quite as bad for coral reefs as was previously thought . a team of australian scientists say that the damage done by increasing amounts of carbon dioxide in the
__label__1 , powell arrives in israel in latest push for peace , secretary of state colin l . powell , arriving in israel for his first visit in a year and a half , said he would work with palestinian leaders to come up with an american
__label__4 , u . scientists create
__label__2 , bentley hoping to avoid getting tripped up by post , a year ago , the bentley falcons ended c . w . post ' s two-year reign as northeast-10 conference football champions . now the pioneers have a chance to return the favor . sitting atop the conference standings , c . w . post ( 4-1 , 4-0 ) hosts bentley tomorrow and can inflict serious damage on any hopes the falcons have of repeating .
__label__1 , golf woods upset at ryder post , tiger woods says mark o ' meara should have been named us ryder cup captain for 2006 .
__label__1 , milosevic takes back seat as first defence witness on stand ( afp ) , afp - former yugoslav president slobodan milosevic was forced to take a back seat at his trial for genocide and war crimes as lawyers imposed on him by a un tribunal examined his first defence witness .
__label__2 , fifa fines italy , netherlands , france ( ap ) , ap - soccer ' s governing body wednesday fined italy , the netherlands and france for being too lenient with doping offenders .
__label__4 , us mobile firms ' close ' to merger , sprint and nextel reportedly reach a tentative agreement on a merger which will create the third largest mobile phone operator in the us .
__label__1 , ' mercenaries ' coup trial begins , the trial of 14 men accused of plotting to topple equatorial guinea ' s president opens in malabo on monday .
__label__3 , treasuries hit again by technical selling , new york ( reuters ) - treasuries prices were swept lower for a second straight session on friday as technical selling from different pockets of the market overshadowed a moderate set of u . s . inflation numbers .
__label__1 , parliament should sit if election not called latham , opposition leader mark latham today urged prime minister john howard to either call an election and dissolve parliament or allow this week
__label__4 , grid researchers go commerical , the researchers who spawned the idea of grid computing will launch a company on monday to commercialise what so far has been a very academic software project for sharing computing resources .
__label__3 , greenspan says us can weather oil price increases , record oil prices are unlikely to inflict the economic pain they did in the 1970s , federal reserve chairman alan greenspan said friday , adding that he thought the world could adjust to higher-priced oil .
__label__2 , holmes has strained knee status uncertain ( ap ) , ap - priest holmes has strained ligaments in his right knee and might not be able to play for the kansas city chiefs next week .
__label__3 , far fewer jobs were added in november than forecast , the economy added 112 , 000 payroll jobs in november , far fewer than the month before and not enough to keep up with average increases in the adult population .
__label__4 , invasion of the video game ads , ad networks target online gamers as next big audience for product placements .
__label__2 , spurs frustrated by norwichs green , london tottenham were held to a 0-0 draw by premier league newcomers norwich yesterday , spurning the chance to go third in the standings behind fellow london clubs arsenal and chelsea .
__label__1 , blair returns to downing street after heart trouble , hamish robertson british prime minister tony blair has returned to downing street after undergoing hospital treatment for a heart ailment .
__label__2 , oklahoma state ' s miles gets lsu job , names oklahoma state football coach les miles will replace nick saban at louisiana state , a source close to the negotiations said last night . lsu called a news conference for today , and the source , speaking to the associated press on condition of anonymity , said miles will be introduced then . lsu athletic director skip bertman did not immediately return a call seeking . . .
__label__1 , conflicting reports over iraq release , conflicting accounts about the possible release of iraqi detainees -- among them the woman known as quot dr . germ quot -- were issued by the us and iraqi governments wednesday .
__label__4 , microsoft sues firms for software violations , the world
__label__3 , office depot earns fall , blames weather , new york ( reuters ) - office depot inc . < a href=http //www . investor . reuters . com/fullquote . aspx ? ticker=odp . n target=/stocks/quickinfo/fullquote> odp . n< /a> on wednesday unveiled plans to muscle in on rival staple inc . ' s turf to boost sales and stem a fall in market share in the increasingly cutthroat office supply business .
aistudio@jupyter-112853-2339160:~/Starspace$
测试数据文件
test.csv的文件格式如下:
aistudio@jupyter-112853-2339160:/tmp/starspace/data/ag_news_csv$ head -10 test.csv
"3","Fears for T N pension after talks","Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul."
"4","The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com)","SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket."
"4","Ky. Company Wins Grant to Study Peptides (AP)","AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins."
"4","Prediction Unit Helps Forecast Wildfires (AP)","AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar."
"4","Calif. Aims to Limit Farm-Related Smog (AP)","AP - Southern California's smog-fighting agency went after emissions of the bovine variety Friday, adopting the nation's first rules to reduce air pollution from dairy cow manure."
"4","Open Letter Against British Copyright Indoctrination in Schools","The British Department for Education and Skills (DfES) recently launched a ""Music Manifesto"" campaign, with the ostensible intention of educating the next generation of British musicians. Unfortunately, they also teamed up with the music industry (EMI, and various artists) to make this popular. EMI has apparently negotiated their end well, so that children in our schools will now be indoctrinated about the illegality of downloading music.The ignorance and audacity of this got to me a little, so I wrote an open letter to the DfES about it. Unfortunately, it's pedantic, as I suppose you have to be when writing to goverment representatives. But I hope you find it useful, and perhaps feel inspired to do something similar, if or when the same thing has happened in your area."
"4","Loosing the War on Terrorism","\\""Sven Jaschan, self-confessed author of the Netsky and Sasser viruses, is\responsible for 70 percent of virus infections in 2004, according to a six-month\virus roundup published Wednesday by antivirus company Sophos.""\\""The 18-year-old Jaschan was taken into custody in Germany in May by police who\said he had admitted programming both the Netsky and Sasser worms, something\experts at Microsoft confirmed. (A Microsoft antivirus reward program led to the\teenager's arrest.) During the five months preceding Jaschan's capture, there\were at least 25 variants of Netsky and one of the port-scanning network worm\Sasser.""\\""Graham Cluley, senior technology consultant at Sophos, said it was staggeri ...\\"
"4","FOAFKey: FOAF, PGP, Key Distribution, and Bloom Filters","\\FOAF/LOAF and bloom filters have a lot of interesting properties for social\network and whitelist distribution.\\I think we can go one level higher though and include GPG/OpenPGP key\fingerpring distribution in the FOAF file for simple web-of-trust based key\distribution.\\What if we used FOAF and included the PGP key fingerprint(s) for identities?\This could mean a lot. You include the PGP key fingerprints within the FOAF\file of your direct friends and then include a bloom filter of the PGP key\fingerprints of your entire whitelist (the source FOAF file would of course need\to be encrypted ).\\Your whitelist would be populated from the social network as your client\discovered new identit ...\\"
"4","E-mail scam targets police chief","Wiltshire Police warns about ""phishing"" after its fraud squad chief was targeted."
"4","Card fraud unit nets 36,000 cards","In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m."
aistudio@jupyter-112853-2339160:/tmp/starspace/data/ag_news_csv$
星空智能对话机器人系列博客
|