cut_str() > 그누4 질문답변

cut_str() 정보

cut_str()

본문

멀티 바이트(6바이트까지)로 자르기 하려고 뒤져봤습니다.

링크는 kldp 입니다.

function u8_strcut($str, $limit)
/* Note: */
/* $str must be a valid UTF-8 string */
/* it may return an empty string even if $limit > 0 */
{
$len= strlen($str);

if ($len<= $limit )
return $str;

$len= $limit;

/* ASCII are encoded in the range 0x00 to 0x7F
* The first byte of multibyte sequence is in the range 0xC0 to 0xFD.
* All furthur bytes are in the range 0x80 to 0xBF.
*/

while ($len > 0 && ($ch = ord($str[$len])) >= 128 && ($ch < 192))
$len --;

return substr($str, 0, $len);
}

답글에 보면 다른 함수가 하나 더 있습니다.

// function cut_string_utf8($str, $max_len, $suffix)
// 유니코드용 문자열 자르기 함수.
//
function cut_string_utf8($str, $max_len, $suffix)
{
$n = 0;
$noc = 0;
$len = strlen($str);
while ( $n < $len )
{
$t = ord($str[$n]);
if ( $t == 9 || $t == 10 || (32 <= $t && $t <= 126) )
{
$tn = 1;
$n++;
$noc++;
}
else if ( 194 <= $t && $t <= 223 )
{
$tn = 2;
$n += 2;
$noc += 2;
}
else if ( 224 <= $t && $t < 239 )
{
$tn = 3;
$n += 3;
$noc += 2;
}
else if ( 240 <= $t && $t <= 247 )
{
$tn = 4;
$n += 4;
$noc += 2;
}
else if ( 248 <= $t && $t <= 251 )
{
$tn = 5;
$n += 5;
$noc += 2;
}
else if ( $t == 252 || $t == 253 )
{
$tn = 6;
$n += 6;
$noc += 2;
}
else { $n++; }
if ( $noc >= $max_len ) { break; }
}
if ( $noc <= $max_len ) return $str;
if ( $noc > $max_len ) { $n -= $tn; }
return substr($str, 0, $n) . $suffix;
}

이 함수의 제한 사항은, 반드시 valid UTF-8 string 이어야 한다는 겁니다. ASCII code 0~127인 문자는 1글자로 치고 멀티바이트 문자는 2글자로 칩니다. 게시판의 제목 자를때 쓰는 함수라서... $suffix는 문자 길이가 $max_len 보다 길 경우에, 잘라내고 그 뒤에 붙일 문자열을 뜻합니다.

$str = cut_string_utf8("abcdefg", 4, "...");

라 하면 $str = "abcd..."; 가 되는 겁니다.

-----------------------------------------------------------------------------
2개 중에 하나를 적용해 보려고 하는데 잘 될지 모르겠네요. 지금은 집이라서 여기 저기
뒤지고만 다니고 내일 출근해서 해봐야겠는데, 그동안 고수님들 소스한번씩 훑어봐 주세요.
전 까막눈이라^^

그리고 php5 에서는
iconv_substr() 을 이용하면 된다고 하던데, 제 컴이 php5로 셋팅되어 있어서 그것도 한번
해볼까 합니다.

iconv_substr

(PHP 5)
iconv_substr -- Cut out part of a string
Description
string iconv_substr ( string str, int offset [, int length [, string charset]])

Returns the portion of str specified by the start and length parameters.

If start is non-negative, iconv_substr() cuts the portion out of str beginning at start'th character, counting from zero.

If start is negative, iconv_substr() cuts out the portion beginning at the position, start characters away from the end of str.

If length is given and is positive, the return value will contain at most length characters of the portion that begins at start (depending on the length of string). If str is shorter than start characters long, FALSE will be returned.

If negative length is passed, iconv_substr() cuts the portion out of str from the start'th character up to the character that is length characters away from the end of the string. In case start is also negative, the start position is calculated beforehand according to the rule explained above.

Note that offset and length parameters are always deemed to represent offsets that are calculated on the basis of the character set determined by charset, whilst the counterpart substr() always takes these for byte offsets. If charset is not given, the character set is determined by the iconv.internal_charset ini setting.

--------------------------------------------------
lib/common.lib.php 내의 cut_str() 함수를 수정해서 쓸려고 합니다. 조언좀 해주세요.

댓글 전체

태권보이 홈페이지 자기소개 아이디로 검색 회원게시물

홈페이지 자기소개 아이디로 검색 회원게시물

05.12.04 10:17:32

이것도 해볼만 하네요.

Thanks Darien from /freenode #php for the following example (a little bit changed).

It just prints the 6th character of $string.
You can replace the digits by the same in japanese, chinese or whatever language to make a test, it works perfect.

<?php
mb_internal_encoding("UTF-8");
$string = "0123456789";
$mystring = mb_substr($string,5,1);
echo $mystring;
?>

(I couldn't replace 0123456789 by chinese numbers for example here, because it's automatically converted into latin digits on this website, look :
零一二三四
五六七八九)

gilv
drraf at tlen dot pl
23-Feb-2005 11:44
Note: If borders are out of string - mb_string() returns empty _string_, when function substr() returns _boolean_ false in this case.
Keep this in mind when using "===" comparisions.

Example code:
<?php

var_dump( substr( 'abc', 5, 2 ) ); // returns "false"
var_dump( mb_substr( 'abc', 5, 2 ) ); // returns ""

?>

It's especially confusing when using mbstring with function overloading turned on.

태권보이 홈페이지 자기소개 아이디로 검색 회원게시물

홈페이지 자기소개 아이디로 검색 회원게시물

05.12.04 10:34:12

function cut_str($str, $len, $suffix="…")
{
$s = iconv_substr($str, 0, $len, $g4[charset]);

if (strlen($s) >= strlen($str))
$suffix = "";
return $s . $suffix;
}

이렇게 하면 말이 되나요? OTL
이건 안되네요... 메모리 어쩌구 저쩌구....

태권보이 홈페이지 자기소개 아이디로 검색 회원게시물

홈페이지 자기소개 아이디로 검색 회원게시물

05.12.04 11:28:42

function cut_str($str, $len, $suffix="…")
{
// $s = substr($str, 0, $len);
while ($len > 0 && ($ch = ord($str[$len])) >= 128 && ($ch < 192))
$len --;
$s = substr($str, 0, $len);
if (strlen($s) >= strlen($str))
$suffix = "";
return $s . $suffix;
}

이건 정상 작동합니다^^ 아이 좋아라.. 근데 코드의 효율이라든가 그런건 모르겠습니다.
계속 이상없기를 바라며...

웨디 자기소개 아이디로 검색 회원게시물

자기소개 아이디로 검색 회원게시물

06.04.12 17:44:49

태권보이님이 적어주신것으로 바꾸니 아주 잘 돌아가는군요.. utf8인상태로요..
게시물에 cut_str한 제목을 넣어야 하는데 계속 깨지더라구요 마지막 한문자가..
감사합니다.. 덕분에 잘 해결했군요..

ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ 자기소개 아이디로 검색 회원게시물

자기소개 아이디로 검색 회원게시물

10.09.03 00:06:04

태권보이님 감사합니다.~^^

cut_str() > 그누4 질문답변

그누4 질문답변

cut_str() 정보

관련링크

본문

댓글 전체

회원로그인