×

Loading...
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。
Ad by
  • 最优利率和cashback可以申请特批,好信用好收入offer更好。请点链接扫码加微信咨询,Scotiabank -- Nick Zhang 6478812600。

这是个现成的java.flex文件, 支持绝大部分JAVA关键词:

本文发表在 rolia.net 枫下论坛/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Copyright (C) 1998,99 Gerwin Klein <kleing@informatik.tu-muenchen.de>. *
* All rights reserved. *
* *
* This program is free software; you can redistribute it and/or modify *
* it under the terms of the GNU General Public License. See the file *
* COPYRIGHT for more information. *
* *
* This program is distributed in the hope that it will be useful, *
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
* GNU General Public License for more details. *
* *
* You should have received a copy of the GNU General Public License along *
* with this program; if not, write to the Free Software Foundation, Inc., *
* 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */


/* Java 1.2 language lexer specification */

/* Note, that this lexer specification is not tuned for speed.
It is in fact quite slow on integer and floating point literals,
because the input is read twice and the methods used to parse
the numbers are not very fast.
For a real world application (e.g. a Java compiler) this can
and should be optimized */

package mj.compiler.parser;

import java_cup.runtime.Symbol;

%%

%public
%class Scanner

%unicode

%cupsym Sym
%cup

%line
%column

%{
/**
* OK, listen up, there's a wrapper thing going on here.
*
* JavaCUP insists that the 'tokens' we return are java_cup.runtime.Symbol.
* And that isn't an interface, its a class. And it doesn't want to give
* us the instances, it wants to give us the fields of the instances. So
* we can't really do what we want with that class, which is to add line
* and column info to it. Well, we could add it, but we'd have to call it
* left and right, and that doesn't seem very easy to understand.
*
* So, we have our OWN CLASS OF TOKEN, called Token.
* And we have JAVA CUP's CLASS OF TOKEN, called Symbol.
*
* And we wrap instances of Token inside instances of Symbol. Our Tokens
* sit in the value field of Symbols. So Symbol is doing two things:
* (i) getting the token kind to JavaCUP, and (ii) getting our Tokens to
* JavaCUP, so that we can get them back.
*
* We read everything we need out of Tokens (not symbols). Most of that
* happens in the AST constructors.
*/
private Symbol makeSymbol(int kind) {
return new Symbol(kind, makeToken(kind));
}

/**
* See documentation at makeSymbol.
*/
private Token makeToken(int kind) {
return new Token(kind, yyline, yycolumn, yytext());
}
%}


/* main character classes */
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]

WhiteSpace = {LineTerminator} | [ \t\f]

/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}

TraditionalComment = "/*" [^*] {CommentContent} \*+ "/"
UnterminatedComment = "/*" {CommentContent} \** "/"?
EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
UnterminatedEndOfLineComment = "//" {InputCharacter}*
DocumentationComment = "/**" {CommentContent} \*+ "/"

CommentContent = ( [^*] | \*+[^*/] )*

/* identifiers */
Identifier = [:jletter:][:jletterdigit:]*

/* integer literals */
DecIntegerLiteral = 0 | [1-9][0-9]*

%%


/* keywords */
"boolean" { return makeSymbol(Sym.BOOLEAN); }
"break" { return makeSymbol(Sym.BREAK); }
"class" { return makeSymbol(Sym.CLASS); }
"continue" { return makeSymbol(Sym.CONTINUE); }
"else" { return makeSymbol(Sym.ELSE); }
"for" { return makeSymbol(Sym.FOR); }
"int" { return makeSymbol(Sym.INT); }
"new" { return makeSymbol(Sym.NEW); }
"if" { return makeSymbol(Sym.IF); }
"return" { return makeSymbol(Sym.RETURN); }
"void" { return makeSymbol(Sym.VOID); }
"while" { return makeSymbol(Sym.WHILE); }
"this" { return makeSymbol(Sym.THIS); }

/* boolean literals */
"true" { return makeSymbol(Sym.BOOLEAN_LITERAL); }
"false" { return makeSymbol(Sym.BOOLEAN_LITERAL); }

/* null literal */
"null" { return makeSymbol(Sym.NULL_LITERAL); }


/* separators */
"(" { return makeSymbol(Sym.LPAREN); }
")" { return makeSymbol(Sym.RPAREN); }
"{" { return makeSymbol(Sym.LBRACE); }
"}" { return makeSymbol(Sym.RBRACE); }
";" { return makeSymbol(Sym.SEMICOLON); }
"," { return makeSymbol(Sym.COMMA); }
"." { return makeSymbol(Sym.DOT); }

/* operators */
"=" { return makeSymbol(Sym.EQ); }
">" { return makeSymbol(Sym.GT); }
"<" { return makeSymbol(Sym.LT); }
"!" { return makeSymbol(Sym.NOT); }
"~" { return makeSymbol(Sym.COMP); }
"==" { return makeSymbol(Sym.EQEQ); }
"<=" { return makeSymbol(Sym.LTEQ); }
">=" { return makeSymbol(Sym.GTEQ); }
"!=" { return makeSymbol(Sym.NOTEQ); }
"&&" { return makeSymbol(Sym.ANDAND); }
"||" { return makeSymbol(Sym.OROR); }
"++" { return makeSymbol(Sym.PLUSPLUS); }
"--" { return makeSymbol(Sym.MINUSMINUS); }
"+" { return makeSymbol(Sym.PLUS); }
"-" { return makeSymbol(Sym.MINUS); }
"*" { return makeSymbol(Sym.MULT); }
"/" { return makeSymbol(Sym.DIV); }
"&" { return makeSymbol(Sym.AND); }
"|" { return makeSymbol(Sym.OR); }
"^" { return makeSymbol(Sym.XOR); }
"%" { return makeSymbol(Sym.MOD); }
/* numeric literals */

{DecIntegerLiteral} { return makeSymbol(Sym.INTEGER_LITERAL); }

/* comments */
{Comment} { }
{UnterminatedComment} { throw new RuntimeException("Unterminated comment at EOF.\nComment started at line "+(yyline+1)+", column "+(yycolumn+1)); }
{UnterminatedEndOfLineComment} { throw new RuntimeException("Unterminated comment at EOF.\nComment started at line "+(yyline+1)+", column "+(yycolumn+1)); }
/* whitespace */
{WhiteSpace} { }

/* identifiers */
{Identifier} { return makeSymbol(Sym.IDENTIFIER); }


/* error fallback */
.|\n { throw new RuntimeException("Illegal character \""+yytext()+"\" at line "+(yyline+1)+", column "+(yycolumn+1)); }更多精彩文章及讨论,请光临枫下论坛 rolia.net
Report

Replies, comments and Discussions:

  • 工作学习 / IT技术讨论 / java问题:有什么工具或方法可以实现:读一个java source code file,找出其中的key word或key structure(例如int,for,{}什么的),不知是不是有点类似parser。但是我对parser没什么概念,还请大虾们给补补课。
    • JFlex
      • 多谢!刚刚粗粗地看了一下JFlex web page,好像是一个scanner generator,不知道和parser generator有什么区别?对于scanner、scanner generator、parser、parser generator的概念一片模糊,
        可否大概讲讲JFlex主要做些什么,output什么?多谢!
        • Jflex 的输入就是被分析的文件,你需要写一个文件辅助JFlex 分析符号,用正则表达式描述你需要分析的符号,并用Java写出分析到相应符号是采取的动作。输出是一个Java源程序。
          • 用JFLEX要自己定义关键词, 输入的是JAVA程序, 输出的的TOKENS, 比如1+2输出的是1, +, 2, 不符合楼主的要求, 不知道就SHUT UP, 别误导别人.
            • 很怀疑你是否用过词法生成器。你有什么办法不输入关键词?难道让软件自己发现么?token是内部的表达,得到token后的输出动作有user的java code 处理,如果不处理,缺省动作是输出。
              • 你不懂
                • 请给出理由。
                  • #2059453
                    • 这就是你让我shut up 的理由么?上网上找个现成的程序证明我的观点正确?
                      • 您有没有见过JFLEX的输出啊??请注意, 数字和字母和括号对JFLEX都是关键词, 不是给人眼看的, 是给PARSER分析用的.
                        • user 是可以定义输出动作,意思就是说:我看到一个TOKEN可以任意按我自己的要求输出任意的符号,明白了吗?光学会return 还没有学会用Jflex.
                        • FYI, this is output of jflex, see if anyone wants to analyse it with his/her own eyes.
                          本文发表在 rolia.net 枫下论坛// Result: 4950

                          class For1 {

                          int main() {
                          int c = 0;
                          for (int i=0; i<100; i++) {
                          c=c+i;
                          }
                          return c;
                          }

                          }

                          [Token CLASS 2:0 class]
                          [Token IDENTIFIER 2:6 For1]
                          [Token LBRACE 2:11 {]
                          [Token INT 4:4 int]
                          [Token IDENTIFIER 4:8 main]
                          [Token LPAREN 4:12 (]
                          [Token RPAREN 4:13 )]
                          [Token LBRACE 4:15 {]
                          [Token INT 5:2 int]
                          [Token IDENTIFIER 5:6 c]
                          [Token EQ 5:8 =]
                          [Token INTEGER_LITERAL 5:10 0]
                          [Token SEMICOLON 5:11 ;]
                          [Token FOR 6:2 for]
                          [Token LPAREN 6:6 (]
                          [Token INT 6:7 int]
                          [Token IDENTIFIER 6:11 i]
                          [Token EQ 6:12 =]
                          [Token INTEGER_LITERAL 6:13 0]
                          [Token SEMICOLON 6:14 ;]
                          [Token IDENTIFIER 6:16 i]
                          [Token LT 6:17 <]
                          [Token INTEGER_LITERAL 6:18 100]
                          [Token SEMICOLON 6:21 ;]
                          [Token IDENTIFIER 6:23 i]
                          [Token PLUSPLUS 6:24 ++]
                          [Token RPAREN 6:26 )]
                          [Token LBRACE 6:28 {]
                          [Token IDENTIFIER 7:6 c]
                          [Token EQ 7:7 =]
                          [Token IDENTIFIER 7:8 c]
                          [Token PLUS 7:9 +]
                          [Token IDENTIFIER 7:10 i]
                          [Token SEMICOLON 7:11 ;]
                          [Token RBRACE 8:2 }]
                          [Token RETURN 9:2 return]
                          [Token IDENTIFIER 9:9 c]
                          [Token SEMICOLON 9:10 ;]
                          [Token RBRACE 10:4 }]
                          [Token RBRACE 12:0 }]更多精彩文章及讨论,请光临枫下论坛 rolia.net
                          • wow, 这么复杂呵!
            • right or wrong help, it's HELP. the rude person should shut up.
              • if you really want to try jflex, you need to go on to use javacup to analyse the tokens. Then you can write the java complier. but you still cannot solve your problem ^_^
                • CUP是语法分析器。搂住的要求没有具体内容,不见得一定用到CUP.
                  • 刚回到家,看到有很多回复,很感动(在编程毫无头绪的时候能有人给出出主意真是幸福)。主要是想做一个java code的auto fault seeder,输入一个java源程序,第一步:自动生成一个bug template,在那些可以加入bug的地方
                    作上类似◎bugtype1here的标记,这样这个bug template就不是pure java source code了。第二步:选择一些准备使用的bug类型和准备生成的bug数,然后根据template自动生成一个含有bug的java source code。我问的关于找key word的问题是关于第一步的,目的是想读java source code后,找到可以加bug的地方,然后作上相应的标记,不过现在只有粗粗的想法,‘路还很远呐’。今天我还问过一个关于java 各种bug分类的问题,可惜还每人答。
                    最后,多谢所有花时间答复的朋友,千万别吵架,不然我很过意不去。
                    • 1、用JFlex大致可以胜任这项工作,但JFlex有学习曲线。2、有时候问题没有看上去的那么简单,JFlex有一定的局限性。
                      一旦你的题目中涉及到正则表达是不能表示的语法,Jflex就失灵了。自己写code会更简单。
                      • 关于“需要写一个文件辅助JFlex 分析符号,用正则表达式描述你需要分析的符号”: - what is正则表达式?帮助JFlex分析符号的那个文件内容大概会是什么样的?可否举个例子,谢谢。
                        • 偶都给你现成的"那个文件" (#2059453)和 "例子" (#2059489), 就是不看. 偶也没办法啦, 偶自己SHUT UP *_*
                          • 接受批评。昨天晚上问这个问题时还没来得及仔细看所有的回复,不过今天已经把你给的那个例子打印出来准备慢慢学习,多谢。顺便google了一下,大致明白regular expression是什么啦。
                            不过我觉得现在可能需要先整理一下各种bug类型、以及如何判断在哪里可以加bug,这样才能知道我需要从code中parse出什么信息才能满足判断的需要。
                • 这是个现成的java.flex文件, 支持绝大部分JAVA关键词:
                  本文发表在 rolia.net 枫下论坛/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
                  * Copyright (C) 1998,99 Gerwin Klein <kleing@informatik.tu-muenchen.de>. *
                  * All rights reserved. *
                  * *
                  * This program is free software; you can redistribute it and/or modify *
                  * it under the terms of the GNU General Public License. See the file *
                  * COPYRIGHT for more information. *
                  * *
                  * This program is distributed in the hope that it will be useful, *
                  * but WITHOUT ANY WARRANTY; without even the implied warranty of *
                  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
                  * GNU General Public License for more details. *
                  * *
                  * You should have received a copy of the GNU General Public License along *
                  * with this program; if not, write to the Free Software Foundation, Inc., *
                  * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *
                  * *
                  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */


                  /* Java 1.2 language lexer specification */

                  /* Note, that this lexer specification is not tuned for speed.
                  It is in fact quite slow on integer and floating point literals,
                  because the input is read twice and the methods used to parse
                  the numbers are not very fast.
                  For a real world application (e.g. a Java compiler) this can
                  and should be optimized */

                  package mj.compiler.parser;

                  import java_cup.runtime.Symbol;

                  %%

                  %public
                  %class Scanner

                  %unicode

                  %cupsym Sym
                  %cup

                  %line
                  %column

                  %{
                  /**
                  * OK, listen up, there's a wrapper thing going on here.
                  *
                  * JavaCUP insists that the 'tokens' we return are java_cup.runtime.Symbol.
                  * And that isn't an interface, its a class. And it doesn't want to give
                  * us the instances, it wants to give us the fields of the instances. So
                  * we can't really do what we want with that class, which is to add line
                  * and column info to it. Well, we could add it, but we'd have to call it
                  * left and right, and that doesn't seem very easy to understand.
                  *
                  * So, we have our OWN CLASS OF TOKEN, called Token.
                  * And we have JAVA CUP's CLASS OF TOKEN, called Symbol.
                  *
                  * And we wrap instances of Token inside instances of Symbol. Our Tokens
                  * sit in the value field of Symbols. So Symbol is doing two things:
                  * (i) getting the token kind to JavaCUP, and (ii) getting our Tokens to
                  * JavaCUP, so that we can get them back.
                  *
                  * We read everything we need out of Tokens (not symbols). Most of that
                  * happens in the AST constructors.
                  */
                  private Symbol makeSymbol(int kind) {
                  return new Symbol(kind, makeToken(kind));
                  }

                  /**
                  * See documentation at makeSymbol.
                  */
                  private Token makeToken(int kind) {
                  return new Token(kind, yyline, yycolumn, yytext());
                  }
                  %}


                  /* main character classes */
                  LineTerminator = \r|\n|\r\n
                  InputCharacter = [^\r\n]

                  WhiteSpace = {LineTerminator} | [ \t\f]

                  /* comments */
                  Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}

                  TraditionalComment = "/*" [^*] {CommentContent} \*+ "/"
                  UnterminatedComment = "/*" {CommentContent} \** "/"?
                  EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
                  UnterminatedEndOfLineComment = "//" {InputCharacter}*
                  DocumentationComment = "/**" {CommentContent} \*+ "/"

                  CommentContent = ( [^*] | \*+[^*/] )*

                  /* identifiers */
                  Identifier = [:jletter:][:jletterdigit:]*

                  /* integer literals */
                  DecIntegerLiteral = 0 | [1-9][0-9]*

                  %%


                  /* keywords */
                  "boolean" { return makeSymbol(Sym.BOOLEAN); }
                  "break" { return makeSymbol(Sym.BREAK); }
                  "class" { return makeSymbol(Sym.CLASS); }
                  "continue" { return makeSymbol(Sym.CONTINUE); }
                  "else" { return makeSymbol(Sym.ELSE); }
                  "for" { return makeSymbol(Sym.FOR); }
                  "int" { return makeSymbol(Sym.INT); }
                  "new" { return makeSymbol(Sym.NEW); }
                  "if" { return makeSymbol(Sym.IF); }
                  "return" { return makeSymbol(Sym.RETURN); }
                  "void" { return makeSymbol(Sym.VOID); }
                  "while" { return makeSymbol(Sym.WHILE); }
                  "this" { return makeSymbol(Sym.THIS); }

                  /* boolean literals */
                  "true" { return makeSymbol(Sym.BOOLEAN_LITERAL); }
                  "false" { return makeSymbol(Sym.BOOLEAN_LITERAL); }

                  /* null literal */
                  "null" { return makeSymbol(Sym.NULL_LITERAL); }


                  /* separators */
                  "(" { return makeSymbol(Sym.LPAREN); }
                  ")" { return makeSymbol(Sym.RPAREN); }
                  "{" { return makeSymbol(Sym.LBRACE); }
                  "}" { return makeSymbol(Sym.RBRACE); }
                  ";" { return makeSymbol(Sym.SEMICOLON); }
                  "," { return makeSymbol(Sym.COMMA); }
                  "." { return makeSymbol(Sym.DOT); }

                  /* operators */
                  "=" { return makeSymbol(Sym.EQ); }
                  ">" { return makeSymbol(Sym.GT); }
                  "<" { return makeSymbol(Sym.LT); }
                  "!" { return makeSymbol(Sym.NOT); }
                  "~" { return makeSymbol(Sym.COMP); }
                  "==" { return makeSymbol(Sym.EQEQ); }
                  "<=" { return makeSymbol(Sym.LTEQ); }
                  ">=" { return makeSymbol(Sym.GTEQ); }
                  "!=" { return makeSymbol(Sym.NOTEQ); }
                  "&&" { return makeSymbol(Sym.ANDAND); }
                  "||" { return makeSymbol(Sym.OROR); }
                  "++" { return makeSymbol(Sym.PLUSPLUS); }
                  "--" { return makeSymbol(Sym.MINUSMINUS); }
                  "+" { return makeSymbol(Sym.PLUS); }
                  "-" { return makeSymbol(Sym.MINUS); }
                  "*" { return makeSymbol(Sym.MULT); }
                  "/" { return makeSymbol(Sym.DIV); }
                  "&" { return makeSymbol(Sym.AND); }
                  "|" { return makeSymbol(Sym.OR); }
                  "^" { return makeSymbol(Sym.XOR); }
                  "%" { return makeSymbol(Sym.MOD); }
                  /* numeric literals */

                  {DecIntegerLiteral} { return makeSymbol(Sym.INTEGER_LITERAL); }

                  /* comments */
                  {Comment} { }
                  {UnterminatedComment} { throw new RuntimeException("Unterminated comment at EOF.\nComment started at line "+(yyline+1)+", column "+(yycolumn+1)); }
                  {UnterminatedEndOfLineComment} { throw new RuntimeException("Unterminated comment at EOF.\nComment started at line "+(yyline+1)+", column "+(yycolumn+1)); }
                  /* whitespace */
                  {WhiteSpace} { }

                  /* identifiers */
                  {Identifier} { return makeSymbol(Sym.IDENTIFIER); }


                  /* error fallback */
                  .|\n { throw new RuntimeException("Illegal character \""+yytext()+"\" at line "+(yyline+1)+", column "+(yycolumn+1)); }更多精彩文章及讨论,请光临枫下论坛 rolia.net
        • 这位朋友别着急,首先搞清楚你的问题。
          如果你只是简单的写一段程序实现读一个文件,比如你说的java文件,分析其中的特殊组合的字符,我不认为需要使用什么工具,你完全可以使用自动机原理的算法实现你的需求,即使是前面专家提到的什么JFlex,我想也不外乎这些算法,而且那可能更复杂,更不好分析,你也更不好掌握。你自己把握。

          一般大家写程序都是如此,很多人拿现成的工具什么的来使用,用好了就是了,也很不错。也有人是没有办法了,只能自己打开教科书,从最简单的原理开始,在通用的算法的基础上,一步一步实现自己的算法和想法。

          所以请你把问题交待清楚。免得造成混乱,也不便于讨论。
    • You might use the idea of DFA develope your own algorithm for many applications.
      I think it depends on what kind of application are you going to implement. There are many cases you are not able to used other's code like JFlex, actually i don't know what JFlex is. :)
    • 大家都是好心帮我,千万别吵架。怪我一开始没说清楚,我把更详细些的要求写在#2059495中了,请大家帮我再出出主意。多谢了。
      • 你的bug Template中的bug是必须从每次输入的.java文件中找出来,还是预先保存的bug database,或者是每次读到的bug写进这个template里。
        反正你要找bug完全可以使用automata通用算法完成。这应该是你的第一步。

        第二步,作一个look up take给每个template里的bug一个index,随机产生index和要插入的bug的数量以及位置,当然你应该有valiable控制代码的位置。然后就能产生你要的含有bugs的java souce code了。

        你是做testing吧?想向你学习。
        • 1。一个java source code file对应生成一个bug template java file,这个生成过程需要自己定义规则‘哪里可以放bug’、‘如何在可以放bug的地方作tag标记’。生成的template java file作为以后产生faulty code的基础,
          一个template java file可以随时根据需要生成很多不同版本的faulty code。
          2。我希望建一个RuleSet,用来定义‘哪里可以放bug’、‘如何在可以放bug的地方作tag标记’等等。希望这个RuleSet是将来容易扩充的。
          3。是关于testing的,互相学习吧。(现在主要是我向你学)
          : )
    • 我们用javacc来分析表达式。如果你要分析整个java文件,估计工作量不小。
    • 我不知道写得对不对阿,我觉得你就用PERL写个正则表达式把所有的JAVA key word 过滤出来就OK 了,呵呵。
    • 闲逛到此,发现高手如云。本着助人为乐的精神,特此贡献本人的一点建议。
      本人初到堕落多,昨晚参加晚会后,后悔没认识几个名人,如醉里吴音,饭得自及真正猪主席(虽然对他意见很大)。有机会请大家发布带照片的通缉
      令。
      1。关于java 各种bug分类的问题。java 101。 Error and Exception. They are all under java.lang.Throwable. You can get them from API Doc.
      2. Compiler class exists in sun.tools.javac.Main. You may use the methods inside.
      知道高手会渣砖,但对新人不能说“不知道就SHUT UP, 别误导别人“。
      我想给兄弟一点思路, 即使不对也不是想误导你。
      • shut up. You really know what you talk about?
    • 用class阿,Java自己带的class类,改改就可以了.