trab1

\documentclass[a4paper,10pt]{article}
\usepackage[utf8]{inputenc}

%opening
\title{Trabalho de Compiladores - Analisador Léxico}
\author{Leonardo Ferreira e Mateus Tassinari}

\begin{document}

\maketitle

\begin{abstract}
Documentação de nosso analisador léxico, que será utilizado futuramente por um interpretador da linguagem G-Portugol.
\end{abstract}

\section{Introdução}
Análise léxica, de um modo geral, é o processo de analisar todos os caracteres de um determinado código e produzir uma sequência de símbolos chamados de ``tokens''.
Um ``token'' descreve um padrão de caracteres que terá algum significado específico, de acordo com a linguagem que será interpretada, podendo ser identificadores, operadores, caracteres, números, atribuições ou palavras reservadas daquela linguagem.

Em nosso trabalho, foi desenvolvido um analisador léxico para a linguagem G-Portugol através da ferramenta Flex. Com o programa Flex nosso trabalho é poupado, de maneira que só precisamos descrever expressões regulares em blocos de código e ele se encarregará de gerar um analisador léxico para estas expressões na linguagem C.
\section{Classificação dos caracteres}
  \subsection{Palavras Chave}
  A linguagem G-Portugol define as seguintes palavras chave ou reservadas:
  \begin{verbatim}
  fim-variáveis, algoritmo, variáveis, inteiro, real,
  caractere, literal, lógico, início, verdadeiro,
  falso, fim, ou, e, não, se, senão, então, fim-se,
  faça, fim-enquanto, para, de, até, enquanto,
  fim-para, matriz, inteiros, reais, caracteres,
  literais, lógicos, função, retorne, passo.
  \end{verbatim}

  E o bloco de códigos correspodente a elas:
  \begin{verbatim}
fim-variaveis   {printf("%s: PALAVRA RESERVADA\n", yytext);}
caractere   {printf("%s: PALAVRA RESERVADA\n", yytext);}
se      {printf("%s: PALAVRA RESERVADA\n", yytext);}
faca        {printf("%s: PALAVRA RESERVADA\n", yytext);}
fim-para    {printf("%s: PALAVRA RESERVADA\n", yytext);}
literais    {printf("%s: PALAVRA RESERVADA\n", yytext);}
algoritmo   {printf("%s: PALAVRA RESERVADA\n", yytext);}
literal     {printf("%s: PALAVRA RESERVADA\n", yytext);}
fim     {printf("%s: PALAVRA RESERVADA\n", yytext);}
senao       {printf("%s: PALAVRA RESERVADA\n", yytext);}
fim-enquanto    {printf("%s: PALAVRA RESERVADA\n", yytext);}
matriz      {printf("%s: PALAVRA RESERVADA\n", yytext);}
logicos     {printf("%s: PALAVRA RESERVADA\n", yytext);}
variaveis   {printf("%s: PALAVRA RESERVADA\n", yytext);}
logico      {printf("%s: PALAVRA RESERVADA\n", yytext);}
ou      {printf("%s: PALAVRA RESERVADA\n", yytext);}
entao       {printf("%s: PALAVRA RESERVADA\n", yytext);}
para        {printf("%s: PALAVRA RESERVADA\n", yytext);}
inteiros    {printf("%s: PALAVRA RESERVADA\n", yytext);}
funcao      {printf("%s: PALAVRA RESERVADA\n", yytext);}
inteiro     {printf("%s: PALAVRA RESERVADA\n", yytext);}
inicio      {printf("%s: PALAVRA RESERVADA\n", yytext);}
e       {printf("%s: PALAVRA RESERVADA\n", yytext);}
fim-se      {printf("%s: PALAVRA RESERVADA\n", yytext);}
de      {printf("%s: PALAVRA RESERVADA\n", yytext);}
reais       {printf("%s: PALAVRA RESERVADA\n", yytext);}
retorne     {printf("%s: PALAVRA RESERVADA\n", yytext);}
real        {printf("%s: PALAVRA RESERVADA\n", yytext);}
nao     {printf("%s: PALAVRA RESERVADA\n", yytext);}
enquanto    {printf("%s: PALAVRA RESERVADA\n", yytext);}
ate     {printf("%s: PALAVRA RESERVADA\n", yytext);}
caracteres  {printf("%s: PALAVRA RESERVADA\n", yytext);}
passo       {printf("%s: PALAVRA RESERVADA\n", yytext);}

  \end{verbatim}
As palavras reservadas ``verdadeiro'' e ``falso'' foram classificadas como valores lógicos em nosso analisador.
  \subsection{Identificadores}
  Segundo o próprio manual da linguagem G-Portugol, todos seus identificadores (nomes de variáveis, funções, etc) são identificados sempre por começo contendo letras (maiúsculas ou minúsculas) ou underscore, seguidos de letras, números ou underscore, zero ou mais vezes.
  \begin{verbatim}

[a-zA-Z_][a-zA-Z0-9_]*  { printf("%s -> IDENTIFICADOR\n",yytext);}
  \end{verbatim}


\subsection{Inteiros e Floats}
  Números inteiros são definidos como uma sequência de dígitos seguidos, opcionalmente, de um expoente positivo. Para reconhecê-los,
  definimos:
  \begin{verbatim}
digitos [0-9]+
expoente_positivo [eE]{digitos}
inteiro {digitos}+{expoente_positivo}?
\end{verbatim}

Floats são definidos como uma sequência de dígitos, seguidos de um ponto, outra sequência de dígitos e um expoente. Floats foram
reconhecidos como:
\begin{verbatim}
expoente [eE]-?{digitos}
fracao "."{digitos}
float {digitos}{fracao}?{expoente}?
\end{verbatim}


  \subsection{Operadores aritméticos}
  Os operadores aritméticos são (+  -  *  /  ++  -- \% $\ll$ $\gg$). O trecho de código correspondente a eles é:
  \begin{verbatim}
"+" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"-" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"*" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"/" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"%" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"++" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"--" {printf("%s -> OPERADORARITMETICO\n",yytext);}
">>" {printf("%s -> OPERADORARITMETICO\n",yytext);}
"<<" {printf("%s -> OPERADORARITMETICO\n",yytext);}

  \end{verbatim}


  \subsection{Operadores Relacionais}
  Os operadores relacionais são ($>$, $>=$, $<$, $<=$, $==$, $!=$). O trecho de código correspondente a eles é:
  \begin{verbatim}
">" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
"<" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
">=" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
"<=" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
"==" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
"!=" {printf("%s -> OPERADORRELACIONAL\n",yytext);}
  \end{verbatim}

  \subsection{Operadores Lógicos}
  Os operadores lógicos são (\&\& $||$ ! \^). O trecho de código correspondente a eles é:
  \begin{verbatim}
"&&" {printf("%s -> OPERADORLOGICO\n",yytext);}
"||" {printf("%s -> OPERADORLOGICO\n",yytext);}
"&" {printf("%s -> OPERADORLOGICO\n",yytext);}
"|" {printf("%s -> OPERADORLOGICO\n",yytext);}
"^" {printf("%s -> OPERADORLOGICO\n",yytext);}
"!" {printf("%s -> OPERADORLOGICO\n",yytext);}
  \end{verbatim}

  \subsection{Símbolos Especiais e Operadores de Atribuição}
  Os símbolos especiais são ( ( ) , ; : \{ \} \# ' $\backslash$ \textquotedblleft . ). O trecho de código correspondente a eles e aos operadores de atribuição é:
  \begin{verbatim}
"(" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
")" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"," {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
";" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
":" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"{" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"}" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"#" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"'" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"\"" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"\\" {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"." {printf("%s -> SIMBOLOESPECIAL\n",yytext);}
"=" {printf("%s -> ATRIBUICAO\n",yytext);}
"+=" {printf("%s -> ATRIBUICAO\n",yytext);}
"-=" {printf("%s -> ATRIBUICAO\n",yytext);}
"*=" {printf("%s -> ATRIBUICAO\n",yytext);}
"/=" {printf("%s -> ATRIBUICAO\n",yytext);}
"&=" {printf("%s -> ATRIBUICAO\n",yytext);}
"^=" {printf("%s -> ATRIBUICAO\n",yytext);}
"|=" {printf("%s -> ATRIBUICAO\n",yytext);}
">>=" {printf("%s -> ATRIBUICAO\n",yytext);}
"<<=" {printf("%s -> ATRIBUICAO\n",yytext);}

  \end{verbatim}

\subsection{Comentários}
  Comentários devem ser ignorados pelo analisador léxico. Comentário de uma única linha são simples, porém os comentários em bloco
  são um pouco mais complexos. O código utilizado para reconhecê-los foi:
  \begin{verbatim}
comentario_linha "//"[^\n]*
comentario_bloco "/""*"+([^*/][^*]*"*"+)*"/"
\end{verbatim}
Além disso, quando encontra-se um bloco de comentário, é feita uma pequena correção na variável lines, que conta o número de linhas
do arquivo, para incluir as quebras de linha que estejam dentro do bloco de comentário:
\begin{verbatim}
{comentario_bloco} {
  strcpy(comment,yytext);
  for(i=0; i<strlen(comment); i++) {
    if(comment[i] == '\n') lines++;
  }
}
\end{verbatim}

\section{Exemplos}
Nos exemplos, buscamos identificar os diversos operadores, identificadores e demais características do input. Também prestamos atenção
aos blocos de comentários que foram problemáticos.
\end{document}