这就是网页过滤规则编辑器,在这里您可以修改匹配规则 以重写网页。点击下图相应区域可以获取对应的说明。
基本知识
匹配规则的功能简单的就像文字处理软件的“查找和替换”功能。任何符合“匹配表达式”的文本都将被替换为所需文本。例如设置一个把"Rimmer" 替换为 "That Smeghead"的规则,将会把页面里所有的"Rimmer" 替换为 "That Smeghead",是不是很简单?
当您开始添加 匹配规则 的时候您会发现这是一件很有乐趣的事情。
"<start>" and "<end>"
除了标准文本和匹配表达式之外,匹配表达式还有两个符号具有特殊的意义: <start>和 <end>。
<start>将会把替换文本插入到网页的开始位置,使用这个符号可以为网页添加JavaScript。同样的, <end>将会插入到网页的尾部。
对于这些特殊的情况,即便指定了界限和限定也是无效的。当使用多个规则的时候,每个规则都将按照网页过滤列表中的排列顺序逐个执行。
【For these special cases, bounds and limit are ignored. Also when used by multiple rules, the items will be added in the same order they appear in the web page filter list.】What the heck is this Scope thing anyway?
在HTML中有一些特殊的标签。范围设置使过滤规则可以借助标签决定所要查找内容的长度。如果不设定范围,过滤规则将会查找全部页面的全部代码来进行匹配,在查找完成之前你只能等待结果,感谢html设计师使用了标签,这使得我们可以更快的完成处理。
【In HTML it's not uncommon for tags to run for several lines. Scope settings allow the filter to determine how far forward to search for the end of a match after finding the start. If not for scope, the entire web page might have to be scanned before a rule could be sure a there was no match. Not a good idea, since no data could be sent to your browser until the whole page finished loading. Thankfully the designers of HTML gave their tags predictable beginning and ending values which makes things a bit easier. 】
字节限制和范围可以限制所搜索的字符总数。
【The byte limit and bounds limit both work together to restrict the amount of text searched. 】字节限制控制所需查字符的数量,超过字数限制后的内容将放弃查找。通常情况下这个数字越小越好,对于大多数标签来说,数值设定在128-256之间或者更小为好。如果您发现匹配规则没有效果时再增加这个数字。数字过大将会使页面打开的速度变得缓慢。
【 controls how many characters forward to look for a match before giving up. Normally keep this as small as possible - for most tags, a value of 128-256 or even less is fine. Increase it if you find a rule that should match isn't working. Making it too large however, can make pages appear to load slower since the program must process more data before sending anything to your browser.】通常最合适的大小取决于所涉及的标签。例如"<Script ... </script>"标签一般需要把字数限制设置的较大,因为通常情况下这个标签会包含很多行的JavaScript。在这种情况下可以尝试把字数限制设置到4096。
【Often the best size to use depends heavily on the tag in question. The "<Script ... </script>" tag for instance, often needs a large limit since it may contain many lines of JavaScript. In this case, try a limit of around 4096.】范围限制是最先执行的匹配表达式,用来控制主表达式的执行范围。范围的含义相当于界定HTML的起始标签,在中间加上星号-就像这样<script * </script>。在这里可以使用任何匹配表达式,使用范围界定会更简单,更有效。
【Bounds Limit is just an initial matching expression used to control the range (or boundaries) of the main matching expression. Normally a bounds check simply consists of the HTML start and end tags with an asterisk in between - "<script * </script>" Anything valid in the matching expression can be used here, but with a bounds check - the simpler it is, the better.】
这是一个可选项目,对于一些简单的匹配表达式并不需要进行设定,当然对于复杂的匹配表达式它会很大提高执行效率,因为主表达式只需要处理它返回的文本即可,而不是全部文本。更为重要的是可以防止匹配规则无谓的匹配无关的文本。看一下下面的例子:
【Its use is optional - you don't need it for many simple matching expressions. However, with complex matches it can improve performance, since the main expression need only be checked if the bounds returns true. More importantly, it's also useful for preventing a rule from matching more text than intended. Take the following rule intended to match a web link....】
匹配: <a * href="slugcakes.html" > * </a>
如果希望匹配下面的文本
<a href="crabcakes.html" > some stuff </a><br>
<a href="slugcakes.html" > other stuff </a>
表达式 <a * href="slugcakes.html" > * </a>中的第一个星号将会匹配所有 蓝色高亮显示的区域,所以 两个 链接都是符合匹配的! 通过范围限制例如 "<a * </a>" 将只会匹配第一个链接。
匹配表达式以及范围【The Matching Expression and Bounds】
当没有使用 范围 界定时,决不要把通配符放在匹配表达式的开始或者结尾的位置上,例如:( " *foo* ")。这将会使过滤规则突破大多数的字节数限制,匹配到你所不希望的文本。
当使用 范围界定时情况就发生了变化,范围界定限制了所要搜索文本的范围,匹配表达式只会对范围之内的文本进行匹配。最简单的方法是在表达式的开始以及结尾处使用通配符。通常匹配的变量可以使用 ( "\1 foo \2") 来表示,匹配目标(这里指:foo)之外的部分(这里指:\1和\2)将会被暂存并被包含于替换文本当中。
这里是匹配一个链接的例子:<a href="http://somewhere"> some text </a>
范围 | : <a\s*</a> | 字节限制: 128 |
匹配 | : * href="\1" * | |
替换 | : <a href="\1"> some new link next </a> |
URL匹配 -另一种匹配范围控制方法
您可以使用URL匹配来限制过滤规则只对某些特定的网页产生作用。所有的匹配规则都在这里,而您只需要匹配其中的部分URL。您可以使用代表“或”含义的符号“|”来包括所需的多个页面,例如: "www.this.com|www.this.too.com"。您也可以使用代表“不包含”含义的符号 "(^...)"来排除所需页面,例如:"(^www.not.this.page)"。
需要注意的是“http://”不能在URL匹配规则中出现。
如果您有一系列URL需要匹配或者您希望对某个指定的URL使用多个过滤规则,您可以使用块文件【blockfile】来实现。例如:假设存在一个“MyURLs”块文件,而您希望过滤规则只对这个文件中的列出的站点产生作用,您只需要在过滤规则的URL匹配选项中输入$LST(MyURLs)即可。同样,假如您希望过滤规则只对不在块文件中的站点产生作用,只需在前面加一个代表“不包含”含义的符号即可,如:(^$LST(MyURLs))。如果您右键单击URL匹配输入区域,您将会发现在弹出的菜单中可以自动的添加或者编辑块文件。
在这个菜单中也有过滤测试选项,将会使您很方便的进行测试。
允许进行多重匹配【Allow for multiple matches】的含义是什么?
通常一个过滤规则所匹配的结果将会发送给浏览器,不允许其他规则进行再处理。这样设计主要是为了在保存了大量需处理字节时可以提高处理的效率,同时也是给予特定的过滤规则高于其他规则的优先权,遵循排在前面的规则优先处理的原则。
【Normally, when a rule is matched the result is sent directly to the web browser - no other rules are allowed to process the matched section. This is mainly for efficiency, as it saves quite a bit of work, but it's also a useful way to give certain filters priority over others - essentially it's first come, first served.】然后有时这样并不能提高效率,例如对于“<Body ... >”标签来说,它包含我们希望修改的元素数量较多,而且各不相同,如果我们有两个过滤规则-一个是修改默认文本颜色,另一个是修改背景图片-我们就遇到了问题。第一个过滤规则将会“垄断”<Body>标签,从而阻止第二个过滤规则对其进行处理。这样就产生了“允许进行多重匹配”选项。第一个过滤规则处理后将会把结果暂时存储在缓存里,这样其他的过滤规则就可以继续进行处理。在上面的例子里,如果我们对第一个过滤规则启用这个选项,第二个过滤规则就可以不受限制进行匹配了。
【This doesn't always work however. Take the "<Body ... >" tag - It contains several, somewhat unrelated, elements that we may want to change. For instance, if we had two rules - one that changed the default text color and another that changed the background image - we'd have a problem. The first rule would prevent the second rule from working by "using-up" the <Body> tag. This is where "allow for multiple matches" comes in. When checked, it inserts the result of a match back into the processing buffer so other rules can get a whack at it. In the above scenario, if we enabled it on the first rule, the second rule could then also match.】这个选项尽管功能强大但还是需要尽量少的使用,这需要比正常情况花费更多的处理进程,如果使用不慎可能会造成匹配 “死循环” 。
考虑到下述情况,假设我们设定了一个过滤规则,其匹配条件是把“frog”替换成“The evil frog must die!”,看起来似乎没有什么不对的地方,但是如果这条过滤规则启用了多重匹配,由于“frog”同时处于替换文本和被替换文本中,这将导致程序进行循环替换。解决办法是什么?如果“frog”是替换文本的第一个单词将不会出现这种情况,下一条过滤规则是从第二个字母开始进行匹配的,所以它看到的是只是“rog”,会忽略前面的“f”。所以请确保您的过滤规则中的被替换文本减去首位字母后不会匹配替换文本即可。
【Consider the following situation - say there's a rule with a matching clause of "frog" and a replacement text of "The evil frog must die!". Looks innocent enough doesn't it? Ah, but if this rule had multiple match enabled, the "frog" in the replacement text would cause the rule to match its own output - resulting in an endless plague of frogs! Why? well the first time the rules "sees" the word frog it inserts the phrase "The evil frog must die!" - simple enough, but the scan continues forward until it hits the new "frog" and the whole process repeats itself. The solution? Well, if "frog" had been the first word in the replacement text this wouldn't have happened. The next match always occurs one letter forwards so it would see "rog" instead of "frog". Just make sure your rule won't match its own replacement text - minus its first character - and all will be ok.】