PNG  IHDRX cHRMz&u0`:pQ<bKGD pHYsodtIME MeqIDATxw]Wug^Qd˶ 6`!N:!@xI~)%7%@Bh&`lnjVF29gΨ4E$|>cɚ{gk= %,a KX%,a KX%,a KX%,a KX%,a KX%,a KX%, b` ǟzeאfp]<!SJmɤY޲ڿ,%c ~ع9VH.!Ͳz&QynֺTkRR.BLHi٪:l;@(!MԴ=žI,:o&N'Kù\vRmJ雵֫AWic H@" !: Cé||]k-Ha oݜ:y F())u]aG7*JV@J415p=sZH!=!DRʯvɱh~V\}v/GKY$n]"X"}t@ xS76^[bw4dsce)2dU0 CkMa-U5tvLƀ~mlMwfGE/-]7XAƟ`׮g ewxwC4\[~7@O-Q( a*XGƒ{ ՟}$_y3tĐƤatgvێi|K=uVyrŲlLӪuܿzwk$m87k( `múcE)"@rK( z4$D; 2kW=Xb$V[Ru819קR~qloѱDyįݎ*mxw]y5e4K@ЃI0A D@"BDk_)N\8͜9dz"fK0zɿvM /.:2O{ Nb=M=7>??Zuo32 DLD@D| &+֎C #B8ַ`bOb $D#ͮҪtx]%`ES`Ru[=¾!@Od37LJ0!OIR4m]GZRJu$‡c=%~s@6SKy?CeIh:[vR@Lh | (BhAMy=݃  G"'wzn޺~8ԽSh ~T*A:xR[ܹ?X[uKL_=fDȊ؂p0}7=D$Ekq!/t.*2ʼnDbŞ}DijYaȲ(""6HA;:LzxQ‘(SQQ}*PL*fc\s `/d'QXW, e`#kPGZuŞuO{{wm[&NBTiiI0bukcA9<4@SӊH*؎4U/'2U5.(9JuDfrޱtycU%j(:RUbArLֺN)udA':uGQN"-"Is.*+k@ `Ojs@yU/ H:l;@yyTn}_yw!VkRJ4P)~y#)r,D =ě"Q]ci'%HI4ZL0"MJy 8A{ aN<8D"1#IJi >XjX֔#@>-{vN!8tRݻ^)N_╗FJEk]CT՟ YP:_|H1@ CBk]yKYp|og?*dGvzنzӴzjֺNkC~AbZƷ`.H)=!QͷVTT(| u78y֮}|[8-Vjp%2JPk[}ԉaH8Wpqhwr:vWª<}l77_~{s۴V+RCģ%WRZ\AqHifɤL36: #F:p]Bq/z{0CU6ݳEv_^k7'>sq*+kH%a`0ԣisqにtү04gVgW΂iJiS'3w.w}l6MC2uԯ|>JF5`fV5m`Y**Db1FKNttu]4ccsQNnex/87+}xaUW9y>ͯ骵G{䩓Գ3+vU}~jJ.NFRD7<aJDB1#ҳgSb,+CS?/ VG J?|?,2#M9}B)MiE+G`-wo߫V`fio(}S^4e~V4bHOYb"b#E)dda:'?}׮4繏`{7Z"uny-?ǹ;0MKx{:_pÚmFמ:F " .LFQLG)Q8qN q¯¯3wOvxDb\. BKD9_NN &L:4D{mm o^tֽ:q!ƥ}K+<"m78N< ywsard5+вz~mnG)=}lYݧNj'QJS{S :UYS-952?&O-:W}(!6Mk4+>A>j+i|<<|;ر^߉=HE|V#F)Emm#}/"y GII웻Jі94+v뾧xu~5C95~ūH>c@덉pʃ1/4-A2G%7>m;–Y,cyyaln" ?ƻ!ʪ<{~h~i y.zZB̃/,雋SiC/JFMmBH&&FAbϓO^tubbb_hZ{_QZ-sύodFgO(6]TJA˯#`۶ɟ( %$&+V'~hiYy>922 Wp74Zkq+Ovn錄c>8~GqܲcWꂎz@"1A.}T)uiW4="jJ2W7mU/N0gcqܗOO}?9/wìXžΏ0 >֩(V^Rh32!Hj5`;O28؇2#ݕf3 ?sJd8NJ@7O0 b־?lldщ̡&|9C.8RTWwxWy46ah嘦mh٤&l zCy!PY?: CJyв]dm4ǜҐR޻RլhX{FƯanшQI@x' ao(kUUuxW_Ñ줮[w8 FRJ(8˼)_mQ _!RJhm=!cVmm ?sFOnll6Qk}alY}; "baӌ~M0w,Ggw2W:G/k2%R,_=u`WU R.9T"v,<\Ik޽/2110Ӿxc0gyC&Ny޽JҢrV6N ``یeA16"J³+Rj*;BϜkZPJaÍ<Jyw:NP8/D$ 011z֊Ⱳ3ι֘k1V_"h!JPIΣ'ɜ* aEAd:ݺ>y<}Lp&PlRfTb1]o .2EW\ͮ]38؋rTJsǏP@芎sF\> P^+dYJLbJ C-xϐn> ι$nj,;Ǖa FU *择|h ~izť3ᤓ`K'-f tL7JK+vf2)V'-sFuB4i+m+@My=O҈0"|Yxoj,3]:cо3 $#uŘ%Y"y죯LebqtҢVzq¼X)~>4L׶m~[1_k?kxֺQ`\ |ٛY4Ѯr!)N9{56(iNq}O()Em]=F&u?$HypWUeB\k]JɩSع9 Zqg4ZĊo oMcjZBU]B\TUd34ݝ~:7ڶSUsB0Z3srx 7`:5xcx !qZA!;%͚7&P H<WL!džOb5kF)xor^aujƍ7 Ǡ8/p^(L>ὴ-B,{ۇWzֺ^k]3\EE@7>lYBȝR.oHnXO/}sB|.i@ɥDB4tcm,@ӣgdtJ!lH$_vN166L__'Z)y&kH;:,Y7=J 9cG) V\hjiE;gya~%ks_nC~Er er)muuMg2;֫R)Md) ,¶ 2-wr#F7<-BBn~_(o=KO㭇[Xv eN_SMgSҐ BS헃D%g_N:/pe -wkG*9yYSZS.9cREL !k}<4_Xs#FmҶ:7R$i,fi!~' # !6/S6y@kZkZcX)%5V4P]VGYq%H1!;e1MV<!ϐHO021Dp= HMs~~a)ަu7G^];git!Frl]H/L$=AeUvZE4P\.,xi {-~p?2b#amXAHq)MWǾI_r`S Hz&|{ +ʖ_= (YS(_g0a03M`I&'9vl?MM+m~}*xT۲(fY*V4x@29s{DaY"toGNTO+xCAO~4Ϳ;p`Ѫ:>Ҵ7K 3}+0 387x\)a"/E>qpWB=1 ¨"MP(\xp߫́A3+J] n[ʼnӼaTbZUWb={~2ooKױӰp(CS\S筐R*JغV&&"FA}J>G֐p1ٸbk7 ŘH$JoN <8s^yk_[;gy-;߉DV{c B yce% aJhDȶ 2IdйIB/^n0tNtџdcKj4϶v~- CBcgqx9= PJ) dMsjpYB] GD4RDWX +h{y`,3ꊕ$`zj*N^TP4L:Iz9~6s) Ga:?y*J~?OrMwP\](21sZUD ?ܟQ5Q%ggW6QdO+\@ ̪X'GxN @'4=ˋ+*VwN ne_|(/BDfj5(Dq<*tNt1х!MV.C0 32b#?n0pzj#!38}޴o1KovCJ`8ŗ_"]] rDUy޲@ Ȗ-;xџ'^Y`zEd?0„ DAL18IS]VGq\4o !swV7ˣι%4FѮ~}6)OgS[~Q vcYbL!wG3 7띸*E Pql8=jT\꘿I(z<[6OrR8ºC~ډ]=rNl[g|v TMTղb-o}OrP^Q]<98S¤!k)G(Vkwyqyr޽Nv`N/e p/~NAOk \I:G6]4+K;j$R:Mi #*[AȚT,ʰ,;N{HZTGMoּy) ]%dHء9Պ䠬|<45,\=[bƟ8QXeB3- &dҩ^{>/86bXmZ]]yޚN[(WAHL$YAgDKp=5GHjU&99v簪C0vygln*P)9^͞}lMuiH!̍#DoRBn9l@ xA/_v=ȺT{7Yt2N"4!YN`ae >Q<XMydEB`VU}u]嫇.%e^ánE87Mu\t`cP=AD/G)sI"@MP;)]%fH9'FNsj1pVhY&9=0pfuJ&gޤx+k:!r˭wkl03׼Ku C &ѓYt{.O.zҏ z}/tf_wEp2gvX)GN#I ݭ߽v/ .& и(ZF{e"=V!{zW`, ]+LGz"(UJp|j( #V4, 8B 0 9OkRrlɱl94)'VH9=9W|>PS['G(*I1==C<5"Pg+x'K5EMd؞Af8lG ?D FtoB[je?{k3zQ vZ;%Ɠ,]E>KZ+T/ EJxOZ1i #T<@ I}q9/t'zi(EMqw`mYkU6;[t4DPeckeM;H}_g pMww}k6#H㶏+b8雡Sxp)&C $@'b,fPߑt$RbJ'vznuS ~8='72_`{q纶|Q)Xk}cPz9p7O:'|G~8wx(a 0QCko|0ASD>Ip=4Q, d|F8RcU"/KM opKle M3#i0c%<7׿p&pZq[TR"BpqauIp$ 8~Ĩ!8Սx\ւdT>>Z40ks7 z2IQ}ItԀ<-%S⍤};zIb$I 5K}Q͙D8UguWE$Jh )cu4N tZl+[]M4k8֦Zeq֮M7uIqG 1==tLtR,ƜSrHYt&QP윯Lg' I,3@P'}'R˪e/%-Auv·ñ\> vDJzlӾNv5:|K/Jb6KI9)Zh*ZAi`?S {aiVDԲuy5W7pWeQJk֤#5&V<̺@/GH?^τZL|IJNvI:'P=Ϛt"¨=cud S Q.Ki0 !cJy;LJR;G{BJy޺[^8fK6)=yʊ+(k|&xQ2`L?Ȓ2@Mf 0C`6-%pKpm')c$׻K5[J*U[/#hH!6acB JA _|uMvDyk y)6OPYjœ50VT K}cǻP[ $:]4MEA.y)|B)cf-A?(e|lɉ#P9V)[9t.EiQPDѠ3ϴ;E:+Օ t ȥ~|_N2,ZJLt4! %ա]u {+=p.GhNcŞQI?Nd'yeh n7zi1DB)1S | S#ًZs2|Ɛy$F SxeX{7Vl.Src3E℃Q>b6G ўYCmtկ~=K0f(=LrAS GN'ɹ9<\!a`)֕y[uՍ[09` 9 +57ts6}b4{oqd+J5fa/,97J#6yν99mRWxJyѡyu_TJc`~W>l^q#Ts#2"nD1%fS)FU w{ܯ R{ ˎ󅃏џDsZSQS;LV;7 Od1&1n$ N /.q3~eNɪ]E#oM~}v֯FڦwyZ=<<>Xo稯lfMFV6p02|*=tV!c~]fa5Y^Q_WN|Vs 0ҘދU97OI'N2'8N֭fgg-}V%y]U4 峧p*91#9U kCac_AFңĪy뚇Y_AiuYyTTYЗ-(!JFLt›17uTozc. S;7A&&<ԋ5y;Ro+:' *eYJkWR[@F %SHWP 72k4 qLd'J "zB6{AC0ƁA6U.'F3:Ȅ(9ΜL;D]m8ڥ9}dU "v!;*13Rg^fJyShyy5auA?ɩGHRjo^]׽S)Fm\toy 4WQS@mE#%5ʈfFYDX ~D5Ϡ9tE9So_aU4?Ѽm%&c{n>.KW1Tlb}:j uGi(JgcYj0qn+>) %\!4{LaJso d||u//P_y7iRJ߬nHOy) l+@$($VFIQ9%EeKʈU. ia&FY̒mZ=)+qqoQn >L!qCiDB;Y<%} OgBxB!ØuG)WG9y(Ą{_yesuZmZZey'Wg#C~1Cev@0D $a@˲(.._GimA:uyw֬%;@!JkQVM_Ow:P.s\)ot- ˹"`B,e CRtaEUP<0'}r3[>?G8xU~Nqu;Wm8\RIkբ^5@k+5(By'L&'gBJ3ݶ!/㮻w҅ yqPWUg<e"Qy*167΃sJ\oz]T*UQ<\FԎ`HaNmڜ6DysCask8wP8y9``GJ9lF\G g's Nn͵MLN֪u$| /|7=]O)6s !ĴAKh]q_ap $HH'\1jB^s\|- W1:=6lJBqjY^LsPk""`]w)󭃈,(HC ?䔨Y$Sʣ{4Z+0NvQkhol6C.婧/u]FwiVjZka&%6\F*Ny#8O,22+|Db~d ~Çwc N:FuuCe&oZ(l;@ee-+Wn`44AMK➝2BRՈt7g*1gph9N) *"TF*R(#'88pm=}X]u[i7bEc|\~EMn}P瘊J)K.0i1M6=7'_\kaZ(Th{K*GJyytw"IO-PWJk)..axӝ47"89Cc7ĐBiZx 7m!fy|ϿF9CbȩV 9V-՛^pV̌ɄS#Bv4-@]Vxt-Z, &ֺ*diؠ2^VXbs֔Ìl.jQ]Y[47gj=幽ex)A0ip׳ W2[ᎇhuE^~q흙L} #-b۸oFJ_QP3r6jr+"nfzRJTUqoaۍ /$d8Mx'ݓ= OՃ| )$2mcM*cЙj}f };n YG w0Ia!1Q.oYfr]DyISaP}"dIӗթO67jqR ҊƐƈaɤGG|h;t]䗖oSv|iZqX)oalv;۩meEJ\!8=$4QU4Xo&VEĊ YS^E#d,yX_> ۘ-e\ "Wa6uLĜZi`aD9.% w~mB(02G[6y.773a7 /=o7D)$Z 66 $bY^\CuP. (x'"J60׿Y:Oi;F{w佩b+\Yi`TDWa~|VH)8q/=9!g߆2Y)?ND)%?Ǐ`k/sn:;O299yB=a[Ng 3˲N}vLNy;*?x?~L&=xyӴ~}q{qE*IQ^^ͧvü{Huu=R|>JyUlZV, B~/YF!Y\u_ݼF{_C)LD]m {H 0ihhadd nUkf3oٺCvE\)QJi+֥@tDJkB$1!Đr0XQ|q?d2) Ӣ_}qv-< FŊ߫%roppVBwü~JidY4:}L6M7f٬F "?71<2#?Jyy4뷢<_a7_=Q E=S1И/9{+93֮E{ǂw{))?maÆm(uLE#lïZ  ~d];+]h j?!|$F}*"4(v'8s<ŏUkm7^7no1w2ؗ}TrͿEk>p'8OB7d7R(A 9.*Mi^ͳ; eeUwS+C)uO@ =Sy]` }l8^ZzRXj[^iUɺ$tj))<sbDJfg=Pk_{xaKo1:-uyG0M ԃ\0Lvuy'ȱc2Ji AdyVgVh!{]/&}}ċJ#%d !+87<;qN޼Nفl|1N:8ya  8}k¾+-$4FiZYÔXk*I&'@iI99)HSh4+2G:tGhS^繿 Kتm0 вDk}֚+QT4;sC}rՅE,8CX-e~>G&'9xpW,%Fh,Ry56Y–hW-(v_,? ; qrBk4-V7HQ;ˇ^Gv1JVV%,ik;D_W!))+BoS4QsTM;gt+ndS-~:11Sgv!0qRVh!"Ȋ(̦Yl.]PQWgٳE'`%W1{ndΗBk|Ž7ʒR~,lnoa&:ü$ 3<a[CBݮwt"o\ePJ=Hz"_c^Z.#ˆ*x z̝grY]tdkP*:97YľXyBkD4N.C_[;F9`8& !AMO c `@BA& Ost\-\NX+Xp < !bj3C&QL+*&kAQ=04}cC!9~820G'PC9xa!w&bo_1 Sw"ܱ V )Yl3+ס2KoXOx]"`^WOy :3GO0g;%Yv㐫(R/r (s } u B &FeYZh0y> =2<Ϟc/ -u= c&׭,.0"g"7 6T!vl#sc>{u/Oh Bᾈ)۴74]x7 gMӒ"d]U)}" v4co[ ɡs 5Gg=XR14?5A}D "b{0$L .\4y{_fe:kVS\\O]c^W52LSBDM! C3Dhr̦RtArx4&agaN3Cf<Ԉp4~ B'"1@.b_/xQ} _߃҉/gٓ2Qkqp0շpZ2fԫYz< 4L.Cyυι1t@鎫Fe sYfsF}^ V}N<_`p)alٶ "(XEAVZ<)2},:Ir*#m_YӼ R%a||EƼIJ,,+f"96r/}0jE/)s)cjW#w'Sʯ5<66lj$a~3Kʛy 2:cZ:Yh))+a߭K::N,Q F'qB]={.]h85C9cr=}*rk?vwV렵ٸW Rs%}rNAkDv|uFLBkWY YkX מ|)1!$#3%y?pF<@<Rr0}: }\J [5FRxY<9"SQdE(Q*Qʻ)q1E0B_O24[U'],lOb ]~WjHޏTQ5Syu wq)xnw8~)c 쫬gٲߠ H% k5dƝk> kEj,0% b"vi2Wس_CuK)K{n|>t{P1򨾜j>'kEkƗBg*H%'_aY6Bn!TL&ɌOb{c`'d^{t\i^[uɐ[}q0lM˕G:‚4kb祔c^:?bpg… +37stH:0}en6x˟%/<]BL&* 5&fK9Mq)/iyqtA%kUe[ڛKN]Ě^,"`/ s[EQQm?|XJ߅92m]G.E΃ח U*Cn.j_)Tѧj̿30ڇ!A0=͜ar I3$C^-9#|pk!)?7.x9 @OO;WƝZBFU keZ75F6Tc6"ZȚs2y/1 ʵ:u4xa`C>6Rb/Yм)^=+~uRd`/|_8xbB0?Ft||Z\##|K 0>>zxv8۴吅q 8ĥ)"6>~\8:qM}#͚'ĉ#p\׶ l#bA?)|g g9|8jP(cr,BwV (WliVxxᡁ@0Okn;ɥh$_ckCgriv}>=wGzβ KkBɛ[˪ !J)h&k2%07δt}!d<9;I&0wV/ v 0<H}L&8ob%Hi|޶o&h1L|u֦y~󛱢8fٲUsւ)0oiFx2}X[zVYr_;N(w]_4B@OanC?gĦx>мgx>ΛToZoOMp>40>V Oy V9iq!4 LN,ˢu{jsz]|"R޻&'ƚ{53ўFu(<٪9:΋]B;)B>1::8;~)Yt|0(pw2N%&X,URBK)3\zz&}ax4;ǟ(tLNg{N|Ǽ\G#C9g$^\}p?556]/RP.90 k,U8/u776s ʪ_01چ|\N 0VV*3H鴃J7iI!wG_^ypl}r*jɤSR 5QN@ iZ#1ٰy;_\3\BQQ x:WJv츟ٯ$"@6 S#qe딇(/P( Dy~TOϻ<4:-+F`0||;Xl-"uw$Цi󼕝mKʩorz"mϺ$F:~E'ҐvD\y?Rr8_He@ e~O,T.(ފR*cY^m|cVR[8 JҡSm!ΆԨb)RHG{?MpqrmN>߶Y)\p,d#xۆWY*,l6]v0h15M˙MS8+EdI='LBJIH7_9{Caз*Lq,dt >+~ّeʏ?xԕ4bBAŚjﵫ!'\Ը$WNvKO}ӽmSşذqsOy?\[,d@'73'j%kOe`1.g2"e =YIzS2|zŐƄa\U,dP;jhhhaxǶ?КZ՚.q SE+XrbOu%\GتX(H,N^~]JyEZQKceTQ]VGYqnah;y$cQahT&QPZ*iZ8UQQM.qo/T\7X"u?Mttl2Xq(IoW{R^ ux*SYJ! 4S.Jy~ BROS[V|žKNɛP(L6V^|cR7i7nZW1Fd@ Ara{詑|(T*dN]Ko?s=@ |_EvF]׍kR)eBJc" MUUbY6`~V޴dJKß&~'d3i WWWWWW
Current Directory: /usr/share/doc/grep-2.20
Viewing File: /usr/share/doc/grep-2.20/TODO
Copyright (C) 1992, 1997-2002, 2004-2014 Free Software Foundation, Inc. Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. =============== Short term work =============== See where we are with UTF-8 performance. Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and 70-man_apostrophe.patch. Go through patches in Savannah. Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts. Fix --directories=read. Write better Texinfo documentation for grep. The manual page would be a good place to start, but Info documents are also supposed to contain a tutorial and examples. Some test in tests/spencer2.tests should have failed! Need to filter out some bugs in dfa.[ch]/regex.[ch]. Multithreading? GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e. size_t/ptrdiff_t). Lazy dynamic linking of libpcre. Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one binary. Is there a possibility of doing even better by automatically checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F 0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with libpcre is decided, do the same for libz and libbz2. ================== Matching algorithms ================== Check <http://tony.abou-assaleh.net/greps.html>. Take a look at these and consider opportunities for merging or cloning: -- ja-grep's mlb2 patch (Japanese grep) <ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz> -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep) <http://www.ff.iij4u.or.jp/~nrt/lv/>; -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/> seems like nice work; -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>; -- agrep (Approximate grep) <http://www.tgries.de/agrep/>, from glimpse; -- nr-grep (Nondeterministic reverse grep) <http://www.dcc.uchile.cl/~gnavarro/software/>; -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>; -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>; -- freegrep <http://www.vocito.com/downloads/software/grep/>; Check some new algorithms for matching; talk to Karl Berry and Nelson. Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142) claim that his algorithm is faster than Boyer-More. Worth checking. Fix the DFA matcher to never use exponential space. (Fortunately, these cases are rare.) ============================ Standards: POSIX and Unicode ============================ For POSIX compliance, see p10003.x. Current support for the POSIX [= =] and [. .] constructs is limited. This is difficult because it requires locale-dependent details of the character set and collating sequence, but POSIX does not standardize any method for accessing this information! For Unicode, interesting things to check include the Unicode Standard <http://www.unicode.org/standard/standard.html> and the Unicode Technical Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular Expressions”). Talk to Bruno Haible who's maintaining GNU libunistring. See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/> “Unicode Normalization Forms”), already implemented by GNU libunistring. In particular, --ignore-case needs to be evaluated against the standards. We may want to deviate from POSIX if Unicode provides better or clearer semantics. POSIX and --ignore-case ----------------------- For this issue, interesting things to check in POSIX include the Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in particular Section “Regular Expression General Requirements” and its paragraph about caseless matching (note that this may not have been fully thought through and that this text may be self-contradicting [specifically: “of either data or patterns” versus all the rest]). In particular, consider the following with POSIX's approach to case folding in mind. Assume a non-Turkic locale with a character repertoire reduced to the following various forms of “LATIN LETTER I”: 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069; 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069; 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049 First note the differing UTF-8 octet lengths of U+0049 (0x49) and U+0069 (0x69) versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1). This implies that whole UTF-8 strings cannot be case-converted in place, using the same memory buffer, and that the needed octet-size of the new buffer cannot merely be guessed (although there's a simple upper bound of six times the size of the input, as the longest UTF-8 encoding of any character is six bytes). We have lc(I) = i, uc(I) = I lc(i) = i, uc(i) = I lc(İ) = i, uc(İ) = İ lc(ı) = ı, uc(ı) = I where lc() and uc() denote lower-case and upper-case conversions. There are several candidate --ignore-case logics (including the one mandated by POSIX): Using the if (lc(input_wchar) == lc(pattern_wchar)) logic leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y Y n "i" | Y Y Y n "İ" | Y Y Y n "ı" | n n n Y There is a lack of symmetry between CAPITAL and SMALL LETTERs with this. Using the if (uc(input_wchar) == uc(pattern_wchar)) logic leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y n Y "İ" | n n Y n "ı" | Y Y n Y There is a lack of symmetry between CAPITAL and SMALL LETTERs with this. Using the if ( lc(input_wchar) == lc(pattern_wchar) || uc(input_wchar) == uc(pattern_wchar)) logic leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y n "ı" | Y Y n Y There is some elegance and symmetry with this. But there are potentially two conversions to be made per input character. If the pattern is pre-converted, two copies of it need to be kept and used in a mutually coherent fashion. Using the if ( input_wchar == pattern_wchar || lc(input_wchar) == pattern_wchar || uc(input_wchar) == pattern_wchar) logic (as mandated by POSIX) leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y Y n "İ" | n n Y n "ı" | n n n Y There is a different CAPITAL/SMALL symmetry with this. But there's also a loss of pattern/input symmetry that's unique to it. Also there are potentially two conversions to be made per input character. Using the if (lc(uc(input_wchar)) == lc(uc(pattern_wchar))) logic leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y Y "ı" | Y Y Y Y This shows total symmetry and transitivity (at least in this example analysis). There are two conversions to be made per input character, but support could be added for having a single straight mapping performing a composition of the two conversions. Any optimization in the implementation of each logic must not change its basic semantic. Unicode and --ignore-case ------------------------- For this issue, interesting things to check in Unicode include: -- The Unicode Standard, Chapter 3 (<http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf> “Conformance”), Section 3.13 (“Default Case Operations”) and the toCasefold() case conversion operation. -- The Unicode Standard, Chapter 4 (<http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf> “Character Properties”), Section 4.2 (“Case—Normative”) and the <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt> SpecialCasing.txt and <http://www.unicode.org/Public/UNIDATA/CaseFolding.txt> CaseFolding.txt files from the <http://www.unicode.org/Public/UNIDATA/UCD.html> Unicode Character Database . The <http://www.unicode.org/standard/standard.html> Unicode Standard, Chapter 5 (“<http://www.unicode.org/versions/Unicode5.2.0/ch05.pdf> Implementation Guidelines ”), Section 5.18 (“Case Mappings”), Subsection “Caseless Matching”. The Unicode <http://www.unicode.org/charts/case/> case charts. Unicode uses the if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string)) logic for caseless matching. Let's consider the “LATIN LETTER I” example mentioned above. In a non-Turkic locale, simple case folding yields toCasefold_simple(U+0049) = U+0069 toCasefold_simple(U+0069) = U+0069 toCasefold_simple(U+0130) = U+0130 toCasefold_simple(U+0131) = U+0131 which leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y n n "i" | Y Y n n "İ" | n n Y n "ı" | n n n Y This is different from anything so far! In a non-Turkic locale, full case folding yields toCasefold_full(U+0049) = U+0069 toCasefold_full(U+0069) = U+0069 toCasefold_full(U+0130) = <U+0069, U+0307> toCasefold_full(U+0131) = U+0131 with 0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;; which leads to the following matches: \in I i İ ı pat\ ---------- "I" | Y Y * n "i" | Y Y * n "İ" | n n Y n "ı" | n n n Y This is just sad! Note that having toCasefold(U+0131), simple or full, map to itself instead of U+0069 is in contradiction with the rules of Section 5.18 of the Unicode Standard since toUpperCase(U+0131) is U+0049. Same thing for toCasefold_simple(U+0130) since toLowerCase(U+0131) is U+0069. The justification for the weird toCasefold_full(U+0130) mapping is unknown; it doesn't even make sense to add a dot (U+0307) to a letter that already has one (U+0069). It would have been so simple to put them all in the same equivalence class! Otherwise, also consider the following problem with Unicode's approach on case folding in mind. Assume that we want to perform echo 'AßBC | grep -i 'Sb' which corresponds to input: U+0041 U+00DF U+0042 U+0043 U+000A pattern: U+0053 U+0062 Following “CaseFolding-4.1.0.txt”, applying the toCasefold() transformation to these yields input: U+0061 U+0073 U+0073 U+0062 U+0063 U+000A pattern: U+0073 U+0062 so, according to this approach, the input should match the pattern. As long as the original input line is to be reported to the user as a whole, there is no problem (from the user's point-of-view; implementation is complicated by this). However, consider both these GNU extensions: echo 'AßBC' | grep -i --only-matching 'Sb' echo 'AßBC' | grep -i --color=always 'Sb' What is to be reported in these cases, since the match begins in the *middle* of the original input character 'ß'? Note that Unicode's toCasefold() cannot be implemented in terms of POSIX' towctrans() since that can only return a single wint_t value per input wint_t value.