Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[utf8]{inputenc}
\usepackage[russian]{babel}

\title{Finite-state Morphology}
<<<<<<< HEAD
\author{Nikolay Babakov}
=======
\author{Nikolay Babakov }
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7
\date{November 2018}

\begin{document}

\maketitle Tasks overview

\section{Archiphonemes}
On this stage we should apply new Case to our lexc file. I made these changes

Multichar_Symbols
%<ins%> ! Творительный падеж
%{A%} ! Архифонема а/е

This let me receive necessary output
$ hfst-lexc chv.lexc | hfst-fst2strings
hfst-lexc: warning: Defaulting to OpenFst tropical type
Root...1 CASES...1 PLURAL...2 N...1 Nouns...
пакча<n><ins>:пакча>п{A}
пакча<n><pl><ins>:пакча>сем>п{A}
урам<n><ins>:урам>п{A}
урам<n><pl><ins>:урам>сем>п{A}
канаш<n><ins>:канаш>п{A}
канаш<n><pl><ins>:канаш>сем>п{A}
хула<n><ins>:хула>п{A}
хула<n><pl><ins>:хула>сем>п{A}


\section{Phonological rules}
Tried different rules, exported them to corresponding files (left_right_rule,left_rule etc) , pasted original text and marked changed examples with “!!!” for better visualization

\section{Phonological rules}
Changed lexc file according to the instructions

LEXICON CASES

%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

Multichar_Symbols
%<gen%> ! Дательный падеж

This let me get necessary output
<<<<<<< HEAD
=======
Received necessary result
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7

$ hfst-fst2strings chv.lexc.hfst | grep урам | grep gen
урам<n><gen>:урам>{Ă}н
урам<n><pl><gen>:урам>се{м}>{Ă}н

After modifying lexc file I also added %{м%} archyphonem and implemented the rule for deletion

"Case for deletion {м} arcyphoneme"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

And here is my implementation of Back vowel harmony with exceptions

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

\section{Productive derivation}
<<<<<<< HEAD
I was stuck on this stage
=======
I am finally stuck on this stage
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7
I have updated my chv.lexc as instructed and run the following makefile
all:
./hfst-lexc chv.lexc.txt -o chv.lexc.hfst
./hfst-twolc chv.twol -o chv.twol.hfst
./hfst-compose-intersect -1 chv.lexc.hfst -2 chv.twol.hfst -o chv.gen.hfst
./hfst-invert chv.gen.hfst -o chv.mor.hfst

<<<<<<< HEAD
The reason for this issue was absent nominative case
So I added

%<nom%>:%> # ;

and everything started working well

\section{Loan words}
In this section the changes I have performed were as follows

Multichar_Symbols

%{ъ%}

Alphabet
%{ъ%}:0

Rules
"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;

After that I got necessary output

./hfst-fst2strings chv.gen.hfst | grep gen | grep специалист
специалист<n><gen>:специалистӑн

\section{Modified files overview}
Here I will provide the overview of my results
chv.lexc

FILE CONTENT
Multichar_Symbols

%<n%> ! Имя существительное
%<pl%> ! Множественное число
%<nom%> ! Именительный падеж
%<ins%> ! Творительный падеж
%<gen%> ! Дательный падеж
%{A%} ! Архифонема [а] или [е]
%{Ă%} ! Архифонема [а] или [е]
%> ! Граница морфемы
%{м%} ! Родительный archiphoneme
%<der_лӑх%> !производный суффикс
%{ъ%}
MY COMMENT
This is a list of multichar symbols. These symbols are namely the units which can turn out to be different leeters or absence of the letter at all according to the rules we will state.

FILE CONTENT
LEXICON Root
Nouns ;

MY COMMENT
Here we define the class of words we are going to define

FILE CONTENT
LEXICON CASES
%<nom%>:%> # ;
%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

MY COMMENT
Here we state the list of possible cases and define how are the words changed when each case is applied

FILE CONTENT
LEXICON PLURAL
CASES ;
%<pl%>:%>се%{м%} CASES ;
MY COMMENT
Here we define plural and singular form of the words in use

FILE CONTENT
LEXICON SUBST
CASES ;
PLURAL ;

MY COMMENT
We define rules which can be applied to productive derivation

FILE CONTENT
LEXICON DER-N
%<der_лӑх%>:%>л%{Ă%}х SUBST "weight: 1.0" ;

MY COMMENT
We state rules for derivation and assign weight so if any conflict in rules happen we will know which word should be selected

FILE CONTENT
LEXICON N
%<n%>: PLURAL ;
%<n%>: SUBST ;
%<n%>: DER-N ;

MY COMMENT
Define all main rule which can be applied to our words

FILE CONTENT
LEXICON Nouns
урам:урам N ; ! "улица"
пакча:пакча N ; ! "сад"
хула:хула N ; ! "город"
канаш:канаш N ; ! "совет"
тӗс:тӗс N ; ! "вид"
патша:патша N ; ! "царь"
куҫ:куҫ N ; ! "глаз"
патшалӑх:патшалӑх N ; ! "государство"
специалист:специалист%{ъ%} N ; ! "специалист"

MY COMMENT
The list of words the rules will be applied to.

chv.twol
FILE CONTENT
Alphabet
а ӑ е ё ӗ и о у ӳ ы э ю я б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ
А Ӑ Е Ё Ӗ И О У Ӳ Ы Э Ю Я Б В Г Д Ж З К Л М Н П Р С Ҫ Т Ф Х Ц Ч Ш Щ Й Ь Ъ

MY COMMENT

FILE CONTENT
%{э%}:0 %{л%}:0 %{с%}:0 %{а%}:0
%{A%}:а %{A%}:е
%{Ă%}:ӑ %{Ă%}:ӗ %{Ă%}:0
%{м%}:0 %{м%}:м
%{н%}:н %{н%}:0
%{ъ%}:0
;

MY COMMENT

FILE CONTENT
Sets

Vow = ӑ а ы о у я ё ю ӗ э и ӳ ;

BackVow = ӑ а ы о у я ё ю %{ъ%} ;

FrontVow = ӗ э и ӳ ;

ArchiCns = %{м%} ;

Cns = б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ ;

MY COMMENT
To define rules more compactly we need to unite letters into some sets

FILE CONTENT
Rules

"Remove morpheme boundary"
%>:0 <=> _ ;

"Back vowel harmony for archiphoneme {A}"
%{A%}:а <=> BackVow: [ Cns: | %>: ]+ _ ;

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

"Non surface {Ă} in plural genitive"
%{Ă%}:0 <=> [ Vow: | %{м%}: ] %>: _ н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;

MY COMMENT
The rules which will change the words in certain context
=======
Every time I try to run recommended command I get very strange output and I was not able to do anything with that even though I acted upon telegram's instructions
echo патшалӑх | hfst-lookup -qp chv.mor.hfst
патшалӑх патшалӑх+? inf

All related files in their final state can be found in the same folder where this report is located
>>>>>>> dc918bfc8c12eafb0e80ec50fea9f00385b53cd7

\end{document}
5 changes: 5 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
all:
./hfst-lexc chv.lexc.txt -o chv.lexc.hfst
./hfst-twolc chv.twol -o chv.twol.hfst
./hfst-compose-intersect -1 chv.lexc.hfst -2 chv.twol.hfst -o chv.gen.hfst
./hfst-invert chv.gen.hfst -o chv.mor.hfst
Binary file not shown.
Binary file not shown.
54 changes: 54 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/chv.lexc.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Multichar_Symbols

%<n%> ! Имя существительное
%<pl%> ! Множественное число
%<nom%> ! Именительный падеж
%<ins%> ! Творительный падеж
%<gen%> ! Дательный падеж
%{A%} ! Архифонема [а] или [е]
%{Ă%} ! Архифонема [а] или [е]
%> ! Граница морфемы
%{м%} ! Родительный archiphoneme
%<der_лӑх%> !производный суффикс
%{ъ%}

LEXICON Root

Nouns ;

LEXICON CASES

%<nom%>:%> # ;
%<ins%>:%>п%{A%} # ;
%<gen%>:%>%{Ă%}н # ;

LEXICON PLURAL
CASES ;
%<pl%>:%>се%{м%} CASES ;


LEXICON SUBST
CASES ;
PLURAL ;

LEXICON DER-N

%<der_лӑх%>:%>л%{Ă%}х SUBST "weight: 1.0" ;

LEXICON N

%<n%>: PLURAL ;
%<n%>: SUBST ;
%<n%>: DER-N ;

LEXICON Nouns

урам:урам N ; ! "улица"
пакча:пакча N ; ! "сад"
хула:хула N ; ! "город"
канаш:канаш N ; ! "совет"
тӗс:тӗс N ; ! "вид"
патша:патша N ; ! "царь"
куҫ:куҫ N ; ! "глаз"
патшалӑх:патшалӑх N ; ! "государство"
специалист:специалист%{ъ%} N ; ! "специалист"
Binary file not shown.
45 changes: 45 additions & 0 deletions 2018-komp-ling/practicals/Finite-state Morphology/chv.twol
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Alphabet
а ӑ е ё ӗ и о у ӳ ы э ю я б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ
А Ӑ Е Ё Ӗ И О У Ӳ Ы Э Ю Я Б В Г Д Ж З К Л М Н П Р С Ҫ Т Ф Х Ц Ч Ш Щ Й Ь Ъ
%{э%}:0 %{л%}:0 %{с%}:0 %{а%}:0
%{A%}:а %{A%}:е
%{Ă%}:ӑ %{Ă%}:ӗ %{Ă%}:0
%{м%}:0 %{м%}:м
%{н%}:н %{н%}:0
%{ъ%}:0
;

Sets

Vow = ӑ а ы о у я ё ю ӗ э и ӳ ;

BackVow = ӑ а ы о у я ё ю %{ъ%} ;

FrontVow = ӗ э и ӳ ;

ArchiCns = %{м%} ;

Cns = б в г д ж з к л м н п р с ҫ т ф х ц ч ш щ й ь ъ ;

Rules

"Remove morpheme boundary"
%>:0 <=> _ ;

"Back vowel harmony for archiphoneme {A}"
%{A%}:а <=> BackVow: [ Cns: | %>: ]+ _ ;

"Back vowel harmony for archiphoneme {Ă}"
%{Ă%}:ӑ <=> BackVow: [ ArchiCns: | Cns: | %>: ]+ _ ;
except
%{м%}: %>: _ н ;
Vow: %>: _ н ;

"Non surface {Ă} in plural genitive"
%{Ă%}:0 <=> [ Vow: | %{м%}: ] %>: _ н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{м%}:0 <=> _ %>: %{Ă%}: н ;

"Non surface {м} if following %{Ă%}: followed by н"
%{ъ%}:0 <=> _ %>: %{Ă%}: н ;
Binary file not shown.
Loading