[PHP] 中英文截斷字元,保持單字完整

by Mesak

寫PHP的時候常常需要擷取文字,但是有時候直接截斷,會讓整個文字失去原本意義

寫了一小段CODE來套用,根據內文的中文與英文數量,來判斷是中文多還是英文多,如果是中文多,整體的擷取字需要除以二(整數)

<?php
function word_cut($string, $limit, $pad="...") { 
     $len = mb_strlen($string,'UTF-8');
     if($len <= $limit)
       return $string; 
     preg_match_all('/(\w)/',$string,$match);
     if( is_array($match) && count($match[1]) < $len / 2 ){
         $limit = intval($limit / 2);
     };
     $cut_word = array('。',',',';','.',',',';',' ',"\n");
     $string = mb_substr($string, 0, $limit,'UTF-8');
     $test_max = 15;
     while( $limit > 0 ){
         $limit--;
         $key = mb_substr($string,$limit,1,'UTF-8');
         if( in_array( $key , $cut_word ) || $test_max == 0 ){
            $string = mb_substr($string, 0, $limit,'UTF-8');
            break;
         }
         $test_max--;
     }
     return $string.$pad; 
} 
/***** Example ****/ 


$str = 'Lorem ipsum dolor sit amet, summo impedit cum ex, eam possim suavitate voluptatum id. Purto probatus assueverit id pri, per no facilisi ullamcorper. Esse scripserit nam id, esse solet deleniti an vel. No nec quem nisl deterruisset, qui eu putant volumus, tollit fierent vel cu. Eu eos legimus signiferumque.
Ut civibus fastidii eos. Doming deleniti instructior ne mei, eu veri possim vel. Cu voluptua appareat nam, ea est odio case, vim ut nullam aliquid. In pro vitae quando feugait. Ut tota voluptatibus sit, ad libris numquam corrumpit sed, eum sumo aeterno id.
Vix wisi civibus ea, officiis dignissim euripidis sit ea, quot constituam vim an. Quo consul convenire democritum id, reque facete dissentiunt ius cu. Te virtute albucius eum. Salutandi imperdiet voluptatibus eam id, at nam adhuc ipsum voluptaria. Ad summo ridens epicuri qui, id sed sadipscing reprehendunt. Iudico melius forensibus an per, ut augue nominavi eligendi usu, pro eu velit facete ceteros.
His modus iusto graecis id. Duo cu petentium urbanitas, usu choro fabulas ne. Novum vulputate ei nam, id laoreet tractatos disputando sed. Eu quod erroribus eum, porro voluptaria has an, te ius iriure praesent. Est ad dictas facilisi liberavisse, ne posse fugit elitr usu. Voluptaria consequuntur has te, duis commodo fierent eam ex. Mei ea augue dolorem.
Eu mea dicant blandit. Doctus albucius consequuntur est eu, epicurei consetetur et ius. No sea diam vidit habemus, qui quem possim consequat ei. Cu vim minim mollis, an vitae pericula percipitur duo. Habeo clita laoreet te nec, pro ea nobis lobortis.';

echo word_cut($str, 20 ) ; 

$chistr = '洲門神叫?時朋夠。式面日除在準。人陸物現。出媽性改地主部發叫與人我;界山一處!功人做進分亞廣無藝應議:他國我人,增大增作直到不些重參而子統息?特巴夠早平素,為童只我不年離,時了外!還成小有不不通岸需他……我要。
健平辦高腦時種總。花人力會什引依社輪陽皮們巴別基這交開聲推人黨我謝,改達因三著童奇通度手來向:西影智趣正處滿深資消當裡變年軍量力孩;來樣快……不次苦重的可。
片己定細無時此近假大的怕切親表魚動能劇結定相科備才微可看關我經子想當時了不提頭為樹施房我、生爭紀覺由我精什前間正,輪一遊市花快,無放內幾原意沒比木;告紙我聽一情相食元。
了特小屋幾;元星根今明北,上程陽一風人別比城區,曾史久爸為場年完全過四必傳作資的,線才起結上?器客起年簡有高!我校原中電道他這邊當上建正運太由們黨想,把一公我成不單?區治如過正海真展有已位思只為玩樣年那一、工學大散,到實沒語長於代王求,環那車生著些行市半……水臺產?
能式人裝軍說然,著球保緊叫業。所不千我業,市得車會、或地的!工連府;病兒製動你單單計車美。';
echo '<br>';
echo  word_cut($chistr, 20 );

結果如下

Lorem ipsum dolor 
洲門神叫?時朋夠。

稍微解釋一下運作流程,首先先判定字數是否小於截斷數

接著計算中文多還是英文多,採用 正則 preg_match_all 來抓取

接著直接擷取字串,然後開始往回計算符號的位置,讓擷取只會少不會多,英文的擷取比較簡單,大部分判斷符號或是空白就可以了

但是中文的擷取就比較麻煩,雖然說也是應用符號,可是中文文章有時候習慣不好就都沒符號…所以設定了一個往回計算的上限,往回次數超過次數就直接停止。

最後加上點點點字元

中文假文文章生產器

英文假文文章生產器

You may also like