Skip to content

Replace Selenium with Direct HTTP Requests #1

@KaykCaputo

Description

@KaykCaputo

Description:

The project currently uses Selenium WebDriver to fetch data from URLs such as:
https://sisu-api.sisu.mec.gov.br/api/v1/oferta/instituicoes/uf
https://sisu-api.sisu.mec.gov.br/api/v1/oferta/instituicao/{iesCode}
https://sisu-api.sisu.mec.gov.br/api/v1/oferta/{offerCode}/modalidades

These endpoints return raw JSON, so using Selenium to load a browser, parse HTML, and extract the JSON from a <pre> tag is unnecessary. This makes the program slower, more fragile, and more resource-heavy.
This issue proposes replacing all Selenium-based HTTP fetching with direct HttpClient GET requests.


Tasks

  • Create a static HttpClient instance, e.g.:
    private static readonly HttpClient client = new HttpClient();
  • Replace all driver.Navigate().GoToUrl(url) calls with:
    var json = await client.GetStringAsync(url);
  • Remove HTML parsing using HtmlAgilityPack
  • Deserialize JSON directly using JsonConvert.DeserializeObject
  • Preserve existing data structures (Institution, Course, CourseWeights, etc.)
  • Remove Selenium and HtmlAgilityPack from the project dependencies
  • Update README to reflect that Selenium is no longer required
  • Ensure all current features behave identically after the change

Why This Is Good for Beginners

  • Involves simple and clear modifications
  • Improves code performance and maintainability
  • No advanced browser automation required
  • Easy to test because the APIs are public and return consistent JSON
  • Gives contributors experience with networking and JSON parsing

Optional Enhancements

  • Add error handling for network failures
  • Implement timeout for requests
  • Validate missing or malformed JSON fields

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions