Ca devrait interresser pas mal de monde ici... Je vous l'accorde, c'est un peu bas niveau et en anglais. Laissez moi faire un rapide résumé:
po4a est un outil pour simplifier la traduction (et surtout la maintenance de traduction) de documentation grace à gettext et ses amis.
Pour l'instant, il y a 3 modules :
- pod (pour la doc perl), qui marche parfaitement - man (les bonnes vieilles pages de manuel), qui marche tant qu'on frime pas en nroff. Ie, il marche pour 76% des pages installées sur ma machine. - kernelhelp (la doc des options de compilation du noyau), qui marche parfaitement.
La suite, c'est un module sgml en gestation, et un usage plus large de ce programme, pour avoir des retours des utilisateurs. Gerard, je pense effectivement à toi et à ton projet de traduction des pages man...
Merci, Mt.
----- Forwarded message from mquinson -----
Date: Fri, 27 Dec 2002 21:31:18 +0100 To: po4a-dev@nongnu.org Cc: debian-i18n@lists.debian.org Subject: Version 0.11 of po4a, and future directions
Hello,
I'm pleased to introduce you a new version of po4a. This is version 0.11 (yup, same version than gettext ;)
General =======
First of all, i moved back to a single package organization. The former organization (ie liblocale-po4a-perl, po-pod and po-man) was a nightmare to maintain because most of the code was identical between po-pod and po-man. For example, the diff between pod-gettextize and man-gettextize was one line long, the only difference being which module you load (Locale::Po4a::Man.pm or Locale::Po4a::Pod.pm).
So, now, there is only one package, and only one serie of binaries for all modules. Of course, each binary take a new argument specifying which module do you want to use.
That's really easier to maintain, but I'll go into problems when a po4a module have a dependance on extra libs/program like po-debiandoc does with ngsml, for example. I guess po4a will recommand those extra package, and try to gracefully fail if they are not present on the system, just like debconf does when curse isn't installed.
I still have some problems to include the translated documentation of po4a in the generated deb file, but i think it's a detail, and won't go further on that issue here. Anyway, this translation is yet to be done ;)
Modules =======
I've added a new binary to the package called po4a-identity which is usefull to test modules. It takes a document to translate as input, and produce its translation without using any po file. If the module is idempotent, the produced document should be exactly the same than the original one. This allows me to speak now about the status of each modules.
In fact, no module can be idempotent, since we wrap paragraphs around. So, to test the idempotence of a module, one have to compare the generated text output from the original document to the one obtained with the po4a-identity output.
Pod.pm ======
I did run some tests for all the pod files on my machine, and Pod.pm runs almost perfectly. Here are the known problems
1) The wrapping is sometimes changed. ..................................... It's just weird. Sometimes, the two spaces after a ';' for example are kept by pod2man (or groff?), which seems plainly wrong to me.
2) Text is sometime splited on wrong position ............................................. I have another problem with /usr/lib/perl5/Tk/MainWindow.pod (and some other pages, see below) which contains: C<" #n"> Lake of luck, in the po4a-identity version, this was splited on the space by the wrapping. As result, in the original version, the man contain " #n" and mine contain "" #n"" which is logic since C<blabla> is rewriten "blabla"
Complete list of pages having this problem on my box (from 564 pages ; note that it depend on the choosen wrapping colon): /usr/lib/perl5/Tk/MainWindow.pod /usr/share/perl/5.8.0/overload.pod /usr/share/perl/5.8.0/pod/perlapi.pod /usr/share/perl/5.8.0/pod/perldelta.pod /usr/share/perl/5.8.0/pod/perlfaq5.pod /usr/share/perl/5.8.0/pod/perlpod.pod /usr/share/perl/5.8.0/pod/perlre.pod /usr/share/perl/5.8.0/pod/perlretut.pod
Beside of these two minor issues, the Pod.pm seems quite usable now. On the way, I had to fix a bug here and mask another there:
3) handling of the string "0" is errorprone in Perl ................................................... I submitted the following patch against the Perl Bug Tracker: --- Man.pm 2002-12-18 22:35:43.000000000 +0100 +++ /usr/share/perl/5.8.0/Pod/Man.pm 2002-12-18 22:36:27.000000000 +0100 @@ -759,7 +759,7 @@ $index = $_; $index =~ s/^\s*[-*+o.]?(?:\s+|\Z)//; } - $_ = '*' unless $_; + $_ = '*' unless length($_); s/^*(\s|\Z)/\(bu$1/; if (@{ $$self{SHIFTS} } == @{ $$self{INDENTS} }) { $self->output (".RE\n"); ---------------------------- Without this, "=item 0" was changed to "=item *" because the string "0" is evaluated to wrong, even if it's not the empty string...
4) pod2man lies and don't wrap anything ; groff is smarter than podspec ....................................................................... I had a whole bunch of problems with pages containing stuff like:
FUNCTION { }
(there is a lot of them in Tk bindings documentation). The problem is that the first line isn't indented, so the pod specification says that it should be wrapped. But in fact pod2man don't wrap anything personnaly, and let groff do that for him. The problem is that in groff, the rule is that no indented line is wrapped. So, on the previous example, groff will indent the line "FUNCTION", and that's it.
To mask this bug, I made that the Po4a::Pod.pm parser consider as verbatim any paragraph with at least an indented line. That way, I consider too much paragraphs as verbatim, but it should be harmless.
Man.pm ======
This module isn't idempotent at the source level, because of wrapping, and because I wanted to make translator's life easier. So, if I see this chunk in the original:
| this is a stupid text, but | .B be carefull | it's not that easy to handle.
The translator will face this text in po file (note the use of pod sequence):
| this is a stupid text, but B<be carefull> it's not that easy to handle.
And the produced text will contain
| this is a stupid text, but \fBbe carefull\fR it's not that easy to handle.
raw results of tests ....................
Before I comment them, here are the raw results: # of pages : 4323
Ignored pages : 1432 (33%) parser fails : 850 (20% of all; 29% of unignored)
works perfectly : 1660 (38% of all; 57% of unignored; 81% of processed) change wrapping : 239 ( 5% of all; 8% of unignored; 12% of processed)
undetected problems: 142 ( 3% of all; 5% of unignored; 7% of processed)
Pages ignored are so because they contain a comment indicating that they were produced from the pod format. In that case, po4a refuse to go further, and recommand to the user to translate the source file, not this generated one. For pages generated by other means (like docbook2man), po4a will emit a warning and process the page.
Parser fails on pages based on mdoc(7), pages using conditionals with .if, defining new macros with .de, and more generally, being too cleaver in nroff for our simple parser (which is not a real interpreter).
To detect wrap changes, we run diff on the generated cat files, and if a change is detected, we run a modified version of wdiff(1), which also ignore hyphenation changes. If wdiff don't detect any difference, we assume that the changes are harmless.
But for 3% of the pages, po4a isn't idempotent and can be considered as buggy. Most of the time, the changes are about font, with some chars being bold instead of italics, or so. But the problem may also be problematic. Repporting any problematic problems is good, but if you could come with a fix, it would be even better. ;)
Arguable macro handeling ........................
Here is a list of macro, there definition from groff(7) or man(7) and what I do with it. It's not optimal, but I don't have any better idea for them: .de macro: Define or redefine macro until .. is encountered. Since we're not a real groff interpreter, we can't handle such cases. A possible improvement would be to read the macro name and its definition, compare this to well known user macros, and accept it if the definition matches. .ie cond anything: If cond then anything else goto .el. .if cond anything: If cond then anything; otherwise do nothing. Same problem, but I've really no idea here.
.so filename: Include source file. Not sure what we should do here. For now, I offer the ability to translate the filename to translator. But maybe, we shouldn't even have to translate this, letting man searching for the translated version of the file.
Ununderstood but used macros ............................ Here is a list of such macros (partial list since the program fails on the first unknown macro): .. ." .AT .b .bank .BE ..br .Bu .BUGS .BY .ce .dbmmanage .do .DS .En .EP .EX .Fi .hw .i .Id .l .LO .mf .mso .N .na .NF .nh .nl .Nm .ns .NXR .OPTIONS .PB .pp .PR .PRE .PU .REq .RH .rn .S< .sh .SI .splitfont .Sx .T .TF .The .TT .UC .ul .Vb .zZ
Any input welcome.
Specific problems about some pages ..................................
/usr/share/man/man1/md5sum.1.gz:
Here is the diff at the output level: - d3b07384d113edec49eaa6238ad5ff00 md5-test-file + d3b07384d113edec49eaa6238ad5ff00 md5-test-file Here is the diff at the macro level: -.B d3b07384d113edec49eaa6238ad5ff00\ md5-test-file +.B d3b07384d113edec49eaa6238ad5ff00 md5-test-file
The author wants to put an extra space at the end of the macro arg, but it fails because of the wrapping. Please turn the wrapping of either by indenting the paragraph, or by using the .nf/.fi groff macros.
Conclusion about Man.pm .......................
Since ignored pages are translatable with po4a::pod and since wrapping changes are acceptables in most cases, it looks like the current version of po4a can translate 76% of the man pages on my machine. Moreover, most of the untranslatable pages could be fixed with some simple tricks given above. Isn't that coooool?
As you can see, this module seem mature enough for a wide use. I still would prefer some more testing before releasing the beast.
KernelHelp.pm =============
It's a new module to handle the configuration help of each compilation option of the linux kernel. There is several projects here and there to translate the /usr/src/linux/Configuration.help file, but no assisting tool. An this format is quite prehistoric (chuncks are separated by '\n\n\n'; each chunk is of the form 'short desc\nvariable\nlongdesc, paragraphs separated by one empty line'). Moreover, since all the documentation for all kernel options are stored in only one file, managing translation should be a nightmare. This would explain why 2.4 kernels aren't translated to french while 2.2 were.
The module is done and seem to work. The only problem I know is that wrapping is turned of for now, because the file contains tables which are not specifically indented and would get messed up if I turn wrapping on without getting the original changed.
But that's a detail. The main problem is that we should patch make xconfig and such to use the translation also. Will see how much time I'll have to argue with developpers ;)
FUTURE DIRECTION ================
Once I get Man.pm running sufficiently good (ie, no really hurting diff anymore), I'll give it for general consumption, and die from the numerous bug repports I'll certainly get... For example, it would be more than great if I could convince Gerard Delafond, with coordinate the translation of man pages to french to use po4a. It would be possible, he comes from the kde translation team, they are already convinced of the interest of po-based translation tools.
I now dream of a texinfo module, but this one seems harder (ie, longer) to do. Not as hard as Man.pm, because I can steal code from texi2html, but this perl script is pretty long and indigest...
Another idea would be to embed the addendum files into po files, as regular comments. It would be rather usefull for short addendums, like the ones for man pages, containing only the name of the translators. For bigger addendums, it would still be possible to use separate files.
AVAILABILITY ============
Until end of Xmas break, I can't syncronize my package pool. So, for now, get the package from there:
http://savannah.nongnu.org/download/po4a/
Thanks for reading 'till the end, Mt.
Afficher les réponses par date