11Kpdfsync THINGS-TO-DO
22---------------------------------------------------------------------------------------------------
33
4+ # Alpha Release
45# TASKS Estimated Actual
5- [X] String comparison algorithm, that can analyze the degree of match.
6+ [X] String comparison algorithm, that can analyze the degree of match.
67 So that minor differences between the pattern and the read text
78 from pdf files are handled.
8-
99[X] Use PDFClown library to highlight the text which matches the most
1010 with the highlighted text from My Clippings file.
11-
1211[X] Parse the 'My Clippings.txt' file.
13-
1412[-] Gui POC
13+ [X] Manual and Automatic creation of association between highlights.
14+ and notes.
15+ [ ] Use grid layout for displaying and creating page mappings.
16+ (Not done, in favor of below)
17+ [X] Use custom renderer in list box to show highlight nore mappings.
18+ [X] A separate dialog window for selection of notes for a highlight.
19+ [X] Loging
1520
16- [ ] Finalize GUI
17-
21+ # Beta Release
22+ # TASKS Estimated Actual
1823[ ] Optimization and cleanup objects.
19-
24+ [ ] Lib - Use Iterator instead of Enumeration. (Not sure)
25+ [ ] GUI - Status bar showing last error or success message.
26+ [ ] Lib - parseLine function can be protected. It is public now.
27+ [ ] Lib - matching Bom bytes can be put inside a method in the
28+ ByteOrderMarkTypes enum. It is now separe in
29+ ByteOrderMark file.
2030# BUGS:
21- [ ] The string matching algo is too simple, and gives wrong match
22- percentage, if the strings being compared differ in the number
23- of non-whitespace characters. The two indexes get out of sync
24- at the first mismatch and never recover.
31+ [ ] The string matching algo is too simple, and gives wrong match
32+ percentage, if the strings being compared differ in the number
33+ of non-whitespace characters. The two indexes get out of sync
34+ at the first mismatch and never recover.
2535 Example:
2636 PDF text = 123 56 789
2737 Clipping text = 123 456 789
2838 % match = 3/8 (Wrong)
2939 % match = 7/8 (What is expected)
3040
31- [ ] Related to the above bug, we are highlighting more characters -
32- by that many characters as the diffence in the number of
33- characters, between the text read from the PDF and the pattern
41+ [ ] Related to the above bug, we are highlighting more characters -
42+ by that many characters as the diffence in the number of
43+ characters, between the text read from the PDF and the pattern
3444 read from the clippings file.
3545 The algorithm matches character by character, the pattern and the
3646 text from the pdf. The matching and thus the highlighting is as
@@ -46,7 +56,37 @@ Kpdfsync
4656 highlighting)
4757
4858[ ] For some PDF files, org.pdfclown.tools.TextExtractor.extract() is returning null.
49- This is seen with the Concrete Mathematics original PDF file.
59+ This is seen with the Concrete Mathematics original PDF file. May be a TrueType font issue.
60+ Here is the stack trace:
61+ java.lang.NullPointerException
62+ at java.base/java.util.Hashtable.put(Hashtable.java:476)
63+ at org.pdfclown.documents.contents.fonts.PfbParser.parse(PfbParser.java:99)
64+ at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:96)
65+ at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:141)
66+ at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
67+ at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
68+ at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
69+ at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
70+ at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
71+ at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
72+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
73+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
74+ at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
75+ at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
76+ at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
77+ at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
78+ at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
79+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
80+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:817)
81+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
82+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
83+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
84+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
85+ at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
86+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
87+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
88+ at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
89+ at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
5090
5191[ ] Highlight is not visible on the output PDF file. This was seen on the Concrete Mathematics
5292 cropped PDF file.
@@ -62,3 +102,74 @@ Kpdfsync
62102 5. Begin highlighting.
63103
64104 The times, this exception occures, it occures around the 73% mark.
105+
106+ [ ] EOFException at org.pdfclown.tools.TextExtractor.extract() method. This is seen on
107+ 'the_evolution_of_operating_system_cropped.pdf' file. Could also be a font issue.
108+ Here is the stack trace
109+ java.lang.RuntimeException: java.io.EOFException
110+ at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:703)
111+ at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
112+ at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
113+ at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
114+ at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
115+ at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
116+ at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
117+ at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
118+ at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
119+ at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
120+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
121+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
122+ at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
123+ at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
124+ at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
125+ at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
126+ at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
127+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
128+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
129+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
130+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
131+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
132+ at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
133+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
134+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
135+ at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
136+ at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
137+ at java.base/java.lang.Thread.run(Thread.java:833)
138+ Caused by: java.io.EOFException
139+ at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
140+ at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
141+ at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
142+ at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
143+ ... 27 more
144+ :: Cause #1
145+ java.io.EOFException
146+ at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
147+ at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
148+ at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
149+ at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
150+ at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
151+ at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
152+ at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
153+ at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
154+ at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
155+ at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
156+ at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
157+ at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
158+ at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
159+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
160+ at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
161+ at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
162+ at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
163+ at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
164+ at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
165+ at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
166+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
167+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
168+ at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
169+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
170+ at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
171+ at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
172+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
173+ at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
174+ at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
175+ at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
0 commit comments