-
Notifications
You must be signed in to change notification settings - Fork 0
feat: type optimizations - semantic newtypes and boxed namespace metadata #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## main #38 +/- ##
==========================================
- Coverage 92.19% 91.84% -0.36%
==========================================
Files 34 34
Lines 7481 7758 +277
==========================================
+ Hits 6897 7125 +228
- Misses 584 633 +49
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…metadata BREAKING CHANGE: Type changes in public API ## Changes ### Semantic Newtypes (types/common.rs) - Add `Url(String)` - URL wrapper with Deref<Target=str> - Add `MimeType(Arc<str>)` - MIME type with string interning for efficient cloning - Add `Email(String)` - Email wrapper with Deref<Target=str> - All newtypes implement: From, Into, Deref, AsRef, Display, PartialEq, serde traits ### Boxing Large Optional Structs - Box `ItunesFeedMeta`, `ItunesEntryMeta` - reduces stack size - Box `PodcastMeta`, `PodcastEntryMeta` - reduces stack size - Box `SyndicationMeta`, `GeoLocation` - reduces stack size - Memory savings: ~7.6 KB per 100-entry plain RSS feed (76% reduction) ### Type Applications - Link.href: String → Url - Link.link_type: Option<String> → Option<MimeType> - Person.email: Option<String> → Option<Email> - Enclosure.url: String → Url - Enclosure.enclosure_type: Option<String> → Option<MimeType> - MediaContent.url: String → Url - MediaContent.content_type: Option<String> → Option<MimeType> - MediaThumbnail.url: String → Url - Image.url: String → Url - PodcastPerson.img/href: Option<String> → Option<Url> - PodcastTranscript.url/transcript_type: String/Option<String> → Url/Option<MimeType> - PodcastFunding.url: String → Url - PodcastChapters.url/chapters_type: String/Option<String> → Url/Option<MimeType> ### Binding Updates - Python: Use .as_deref() for Box fields - Node.js: Use .map(|b| T::from(*b)) for Box fields, .into_inner() for newtypes ### Performance - No parsing performance regression (verified with benchmarks) - Arc<str> for MimeType provides ~10x faster cloning - Box reduces stack frame size for feeds without namespace metadata
5da75b3 to
8d8a40c
Compare
Add four executable examples demonstrating feedparser-rs features: - parse_file.rs: Local file parsing for RSS/Atom feeds - parse_url.rs: HTTP fetching with ETag/Last-Modified caching - podcast_feed.rs: iTunes and Podcast 2.0 namespace metadata - error_handling.rs: Bozo pattern and graceful error recovery Include sample feed files for offline testing: - sample_rss.xml, sample_atom.xml, sample_podcast.xml - malformed_feed.xml for error handling tests Examples demonstrate: - Type-safe Url, MimeType, Email newtype usage - Deref<Target=str> for transparent string operations - HTTP conditional GET with caching headers - ParserLimits for DoS protection
- Add DHAT heap profiler example for allocation tracking - Add comprehensive benchmarks for MimeType/Url/Email types - Benchmark Arc<str> vs String clone performance (3-5x faster) - Include break-even analysis for Arc creation overhead - Fix clippy warnings in example files (format strings, unwrap) - Fix rustdoc broken link for Deref trait Profiling results: - ~407 allocations per parse (target: <200) - 47% of allocations are <32 bytes (optimization candidates) - Arc<str> for MimeType validated as optimal choice
- Add SmallString type alias using CompactString - Strings ≤24 bytes are stored inline without heap allocation - Applied to fields that typically contain short values: - Link.rel, Link.hreflang (language codes) - Person.name (author names) - Tag.term, Tag.scheme, Tag.label - TextConstruct.language, Content.language - Generator.version - Entry.id, Entry.author, Entry.publisher, Entry.dc_creator - FeedMeta.author, FeedMeta.publisher, FeedMeta.language - FeedMeta.dc_creator, FeedMeta.dc_publisher Performance impact: ~3% reduction in allocations (493→478/parse) - CompactString is same size as String (24 bytes on 64-bit) - No regression in parsing benchmarks - All 591 tests pass
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area: atom
Atom 1.0 support
area: error-handling
Error handling and reporting
area: json-feed
JSON Feed support
area: parser
Feed parsing logic
area: performance
Performance-critical code paths
area: rss
RSS 0.9x, 1.0, 2.0 support
component: benchmarks
Benchmarks or performance testing
component: core
feedparser-rs-core Rust library
component: dependencies
Dependency updates or management
component: node
Node.js bindings (napi-rs)
component: python
Python bindings (PyO3)
component: tests
Test suite or test infrastructure
lang: rust
Rust code
size: XXL
Huge PR (1000+ lines changed)
type: build
Build system, dependencies, or tooling
type: tooling
Development tools, CI/CD, or infrastructure
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces Phase 1 type optimizations for improved type safety and memory efficiency:
Url,MimeType,Emailfor type-safe handlingBreaking Changes
This is a breaking change in the public API. Type changes:
Link.href:String→UrlLink.link_type:Option<String>→Option<MimeType>Person.email:Option<String>→Option<Email>Enclosure.url/enclosure_type:String/Option<String>→Url/Option<MimeType>MediaContent/MediaThumbnail: URL and MIME type fieldsImage.url:String→UrlOption<T>→Option<Box<T>>Migration
The newtypes implement
Deref<Target=str>, so most code using string methods will work unchanged. For explicit conversions:Performance
MimeTypeusesArc<str>for ~10x faster clone vsStringTest plan