Repository URL to install this package:
|
Version:
2.0.0-beta3-4-armbian20.11.0-trunk1 ▾
|
wiperf
/
usr
/
local
/
lib
/
python3.7
/
dist-packages
/
chardet
/
__pycache__
/
charsetprober.cpython-37.pyc
|
|---|
B
Yus¶Èã @ s0 d dl Z d dlZddlmZ G dd deZdS )é Né )ÚProbingStatec @ sn e Zd ZdZdddZdd Zedd Zd d
Zedd Z d
d Z
edd Zedd Z
edd ZdS )Ú
CharSetProbergffffffî?Nc C s d | _ || _t t¡| _d S )N)Ú_stateÚlang_filterÚloggingÚ getLoggerÚ__name__Úlogger)Úselfr © r ú:/tmp/pip-install-fdhvs41_/chardet/chardet/charsetprober.pyÚ__init__' s zCharSetProber.__init__c C s t j| _d S )N)r Ú DETECTINGr )r r r r
Úreset, s zCharSetProber.resetc C s d S )Nr )r r r r
Úcharset_name/ s zCharSetProber.charset_namec C s d S )Nr )r Úbufr r r
Úfeed3 s zCharSetProber.feedc C s | j S )N)r )r r r r
Ústate6 s zCharSetProber.statec C s dS )Ng r )r r r r
Úget_confidence: s zCharSetProber.get_confidencec C s t dd| ¡} | S )Ns ([ -])+ó )ÚreÚsub)r r r r
Úfilter_high_byte_only= s z#CharSetProber.filter_high_byte_onlyc C s` t }t d| ¡}xH|D ]@}| |dd
¡ |dd
}| ¡ sN|dk rNd}| |¡ qW |S )u9
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [-ÿ]
marker: everything else [^a-zA-Z-ÿ]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
s% [a-zA-Z]*[-ÿ]+[a-zA-Z]*[^a-zA-Z-ÿ]?Néÿÿÿÿó r )Ú bytearrayr ÚfindallÚextendÚisalpha)r ÚfilteredÚwordsÚwordÚ last_charr r r
Úfilter_international_wordsB s
z(CharSetProber.filter_international_wordsc C s¨ t }d}d}x~tt| D ]n}| ||d
}|dkr>d}n|dkrJd}|dk r| ¡ s||kr|s| | ||
¡ | d¡ |d }qW |s¤| | |d
¡ |S )
aÈ
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
Also retains English alphabet and high byte characters immediately
before occurrences of >.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.
Fr r ó >ó <Tr r N)r ÚrangeÚlenr r )r r Úin_tagÚprevÚcurrÚbuf_charr r r
Úfilter_with_english_lettersg s"
z)CharSetProber.filter_with_english_letters)N)r Ú
__module__Ú__qualname__ÚSHORTCUT_THRESHOLDr r Úpropertyr r r r Ústaticmethodr r$ r- r r r r
r # s
%r )r r Úenumsr Úobjectr r r r r
Ú<module> s