Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How I can parse HTML from Blazor-based website? C#

THE PROBLEM

There is a web page with students' schedules: https://education.khai.edu/union/schedule/.

When I make a request by the link I get "empty" HTML. It means that responded HTML contains JS-scripts only, no schedule HTML:

<!DOCTYPE html>
<html lang="ru-UA">
<head>

    ...

    <!--Blazor:{"sequence":1,"type":"server","descriptor":"CfDJ8MBhIHG65FtKlX56pWtbUtQZ6T285HAgOYeCRtbbe9JO4U4cZsYQJ9xvkrUrO01rnP\u002BgDnPMCl0MnI0E/fW58mYoqDZ3J1ztRz/DKm9\u002BABDrmL5ArBFfTFdeO82HHavNnd1E10j7gHBU9uqKmOW2otP1y5s/a\u002BnMT/P2jvdetcCcDQvdfLnX2/w747D4dYNA1MuoeBRlst63xJlks\u002BYeAfhhNMi1s961JEi777JANAEi\u002B9g\u002BNf7aS9sLn\u002BbJZ4m0IBrUnCcHbu3idntWrD/GDpgDVCwhrIhUIPhs8ITgqZHJdQprUnffKWflcMbJ6YyyWBBABTi2eOX/VMHvtFWxT8ABDgmXbyqC3vTfRe6VlwN5ibDYH/UKDkULoJuX\u002Bw\u002BQB2e3sSP1OddN/ud8pWe5\u002BuCo3\u002BkQ9OG6x2GLMXJHWgah"}-->
</head>
<body class="x-background">

<!--Blazor:{"sequence":0,"type":"server","descriptor":"CfDJ8MBhIHG65FtKlX56pWtbUtTdcyRUeUr\u002BhT344Mo3B4Gc0Gg3YwX1FY0c9owxv7oR1MDnLFR1BTJFjhuwYAjnROc3JT8UhSCkRbOdLVMuG0iwpNvwHNc47\u002BrguaHCTkDZKvZ9GKc0Jp\u002BCX0hcssqhCnp6eka\u002BG9Q7XF2B4ARhWnuJDKvUT\u002BbuWra063kFqG0Ixs4eWc4KrPRNS1KnTVu3QZrmx8r9dx6iyQXHjN/YgTqJhcv9LoQqWTfncbhBLwGm9l0BCTBLn3fGdJsOB6ES0lRwvVygmY7DA/2OGzhY7jGppr6UNaUXhdgo4xZDi3FkZgY3OL5xGS1p0bkc14UU9TM="}-->
    <script async src="/ui/app.js?v=3hNtGnhO8Vl6rh70OirKX4BnS6mxiiS5k9p3XAvofZA"></script>
    <script src="_framework/blazor.server.js" autostart="false"></script>
    <script src="_content/Blazor-Analytics/blazor-analytics.js"></script>
    <script>
    async function connectionDown() {
        console.log("Blazor disconnected");
        location.reload();
    }

    function connectionUp() {
        console.log("Blazor connected");
    }

    window.Blazor.start({
        reconnectionOptions: {
            maxRetries: 10,
            retryIntervalMilliseconds: 500,
        },
        reconnectionHandler: {
            onConnectionDown: e => connectionDown(e),
            onConnectionUp: e => connectionUp(e)
        }
    });
</script>
</body>
</html>

When I open the web page from a browser, the website starts SignalR-stream and load the needed schedule HTML.

How I can get a similar result from .NET?


SOME SUCCESSES

I found that the site uses the blazor.server.js script for manipulations with the SignalR-stream. There is its source code: click.

CONNECTION

I tried to rewrite the JS connection code to C# and I had success. JS:

...
async function initializeConnection(options: CircuitStartOptions, logger: Logger, circuit: CircuitDescriptor): Promise<HubConnection> {
  const hubProtocol = new MessagePackHubProtocol();
  (hubProtocol as unknown as { name: string }).name = 'blazorpack';

  const connectionBuilder = new HubConnectionBuilder()
    .withUrl('_blazor')
    .withHubProtocol(hubProtocol);

  options.configureSignalR(connectionBuilder);

  const connection = connectionBuilder.build();
...

.NET - BlazorPackHubProtocol.cs (I couldn't find any other way to change the protocol name, so I created a shell for MessagePackHubProtocol):

using System;
using System.Buffers;
using Microsoft.AspNetCore.Connections;
using Microsoft.AspNetCore.SignalR;
using Microsoft.AspNetCore.SignalR.Protocol;
using Microsoft.Extensions.Options;

namespace KuzCode.SignalR.Protocols.BlazorPack
{
    public class BlazorPackHubProtocol : IHubProtocol
    {
        private MessagePackHubProtocol _protocol;

        public string Name => "blazorpack"; // if the protocol has another name, connection fails
        public int Version => _protocol.Version;
        public TransferFormat TransferFormat => _protocol.TransferFormat;

        public BlazorPackHubProtocol(IOptions<MessagePackHubProtocolOptions> options)
        {
            _protocol = new(options);
        }

        public BlazorPackHubProtocol() : this(Options.Create(new MessagePackHubProtocolOptions())) { }

        public bool IsVersionSupported(int version) => _protocol.IsVersionSupported(version);

        public bool TryParseMessage(ref ReadOnlySequence<byte> input, IInvocationBinder binder, out HubMessage message)
            => _protocol.TryParseMessage(ref input, binder, out message);

        public void WriteMessage(HubMessage message, IBufferWriter<byte> output)
            => _protocol.WriteMessage(message, output);

        public ReadOnlyMemory<byte> GetMessageBytes(HubMessage message) => _protocol.GetMessageBytes(message);
    }
}

.NET - BlazorPackProtocolDependencyInjectionExtensions.cs (for easy use in connection builder):

using KuzCode.SignalR.Protocols.BlazorPack;
using Microsoft.AspNetCore.SignalR;
using Microsoft.AspNetCore.SignalR.Protocol;
using Microsoft.Extensions.DependencyInjection.Extensions;
using System;

namespace Microsoft.Extensions.DependencyInjection
{
    public static class BlazorPackProtocolDependencyInjectionExtensions
    {
        public static TBuilder AddBlazorPackProtocol<TBuilder>(this TBuilder builder) where TBuilder : ISignalRBuilder
            => builder.AddBlazorPackProtocol(_ => { });

        public static TBuilder AddBlazorPackProtocol<TBuilder>(this TBuilder builder, Action<MessagePackHubProtocolOptions> configure)
            where TBuilder : ISignalRBuilder
        {
            builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton<IHubProtocol, BlazorPackHubProtocol>());
            builder.Services.Configure(configure);

            return builder;
        }
    }
}

.NET - KhaiClient.cs (main class with connection):

using Microsoft.AspNetCore.Http.Connections;
using Microsoft.AspNetCore.SignalR.Client;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using System;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace KuzCode.KhaiApiClient
{
    public class KhaiClient
    {
        private HubConnection _hubConnection;

        public KhaiClient()
        {
            _hubConnection = new HubConnectionBuilder()
                .WithUrl("wss://education.khai.edu/_blazor", configuration =>
                {
                    configuration.SkipNegotiation = false;
                    configuration.Transports = HttpTransportType.WebSockets;
                })
                .AddBlazorPackProtocol()
                .ConfigureLogging(logging =>
                {
                    logging.AddConsole();
                    logging.SetMinimumLevel(LogLevel.Debug);
                })
                .Build();
        }

        public async Task ConnectAsync() => await _hubConnection.StartAsync();

        public async Task DisconnectAsync() => await _hubConnection.StopAsync();
    }
}

.NET - Program.cs (for testing):

var khaiClient = new KhaiClient();
khaiClient.ConnectAsync().Wait();

while (true) {}

It's working! I had the next log:

...
info: Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport[1]
      Starting transport. Transfer mode: Binary. Url: 'wss://education.khai.edu/_blazor?id=_f5aporohBDfopQF-CExsA'.

...

info: Microsoft.AspNetCore.SignalR.Client.HubConnection[24]
      Using HubProtocol 'blazorpack v1'.
dbug: Microsoft.AspNetCore.SignalR.Client.HubConnection[28]
      Sending Hub Handshake.

...

dbug: Microsoft.AspNetCore.SignalR.Client.HubConnection[47]
      Receive loop starting.
info: Microsoft.AspNetCore.SignalR.Client.HubConnection[44]
      HubConnection started.

WHAT NEXT?

Now I tried to recreate some requests but had no success.

Select another group request (I selected group 613п): enter image description here

First response: enter image description here


Source code: https://github.com/iiKuzmychov/KhaiApiClient.

Update:

I found a hub implementation I want, I can copy-paste code, but I don`t know how to initialize it.

like image 440
iikuzmychov Avatar asked Feb 02 '26 10:02

iikuzmychov


1 Answers

One workaround is to create a desktop app with CefSharp browser (or analogue). Then you can load the site, get generated source and/or execute javascript to parse it.

like image 82
Flutter Avatar answered Feb 03 '26 22:02

Flutter