Merge pull request 'feat: add employee news links parsing and storage' (#28 ) from feature/employee-news-links into main

Reviewed-on: #28
feat: add employee news links parsing and storage
2026-05-22 15:52:23 +00:00 · 2026-05-22 18:50:25 +03:00 · 2026-05-15 14:40:29 +00:00 · 2026-05-15 17:39:41 +03:00 · 2026-05-14 10:30:06 +00:00 · 2026-05-14 13:29:27 +03:00
45 changed files with 4590 additions and 260 deletions
--- a/.env.example
+++ b/.env.example
@@ -14,7 +14,5 @@ PARSER_USE_PLAYWRIGHT=false
 ADMIN_USERNAME=admin
 ADMIN_PASSWORD=change-me
 SESSION_SECRET=change-me-session-secret
 MCP_TOKEN=change-me-mcp-token
 API_PORT=8000
 MCP_PORT=8001
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,8 @@ __pycache__/
 *.py[cod]
 *.db
 .pytest_cache/
 pytest-cache-files-*/
 .coverage
 htmlcov/
 postgres_data/
 MCP_DESCRIPTION.md
--- a/MCP_DESCRIPTION.md
+++ b/MCP_DESCRIPTION.md
@@ -0,0 +1,671 @@
 # MCP: описание работы, структуры и тулзов
 Документ описывает MCP endpoint сервиса `miem-employees` по текущей реализации в `app/mcp.py`.
 ## Где находится MCP
 - FastAPI router: `app.mcp.router`
 - Подключение к приложению: `app/main.py`
 - HTTP endpoint: `POST /mcp`
 - Локально при обычном запуске API: `http://localhost:8000/mcp`
 - В Docker Compose через отдельный сервис `mcp`: `http://localhost:8001/mcp`
 - Авторизация на уровне приложения: отсутствует. Заголовок `Authorization` не проверяется и не влияет на ответ.
 Если доступ к MCP нужно ограничить, это должно делаться внешним контуром: bind на localhost, VPN, firewall, reverse proxy или отдельная сетевая политика.
 ## Протокол
 Endpoint принимает JSON-RPC 2.0 over HTTP.
 Общий формат запроса:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/list",
  "params": {}
 }
 ```
 Общий формат успешного ответа:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {}
 }
 ```
 Общий формат ошибки:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32601,
    "message": "Method not found"
  }
 }
 ```
 Поддерживаемая версия MCP-протокола:
 ```text
 2024-11-05
 ```
 Имя сервиса:
 ```text
 miem-employees
 ```
 Версия сервера берется из `app.version.BACKEND_VERSION`.
 ## Поддерживаемые JSON-RPC методы
 ### initialize
 Возвращает метаданные MCP-сервера и capabilities.
 Запрос:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {}
 }
 ```
 Ответ:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2024-11-05",
    "serverInfo": {
      "name": "miem-employees",
      "version": "0.7.0"
    },
    "capabilities": {
      "tools": {}
    }
  }
 }
 ```
 ### tools/list
 Возвращает список доступных tools с JSON Schema для аргументов.
 Запрос:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/list",
  "params": {}
 }
 ```
 Ответ содержит массив `result.tools`.
 ### tools/call
 Вызывает один tool по имени.
 Запрос:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "search_employees",
    "arguments": {
      "query": "Сергеев",
      "limit": 20
    }
  }
 }
 ```
 Ответ tool всегда заворачивается в MCP content-массив:
 ```json
 {
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "{\"items\":[]}"
      }
    ]
  }
 }
 ```
 Поле `text` содержит сериализованный JSON с `ensure_ascii=false`. Клиент должен распарсить это поле как JSON, если ему нужна структурированная нагрузка.
 ## Ошибки
 - Неизвестный JSON-RPC метод: `code = -32601`, `message = "Method not found"`.
 - Исключения при обработке tool: `code = -32000`, `message` содержит текст исключения.
 - Если сущность не найдена внутри отдельных tools, HTTP и JSON-RPC ответ остаются успешными, а полезная нагрузка содержит `{"error": "not_found"}`.
 ## Источники данных
 MCP читает данные из основной базы через SQLAlchemy session из `app.db.get_db`.
 Основные таблицы и модели:
 - `employees`: текущая карточка сотрудника, статус, профиль, `current_data`, checksum.
 - `employee_publications`: нормализованные публикации сотрудников с авторами, DOI, аннотацией, описанием, citation text и raw JSON из HSE Publications.
 - `employee_news_links`: нормализованные ссылки на новости из блока профиля «В новостях» с заголовком, URL, кратким описанием, датой, годом публикации и raw JSON карточки.
 - `crawl_runs`: история запусков парсинга.
 - `crawl_run_employee_changes`: детальные изменения сотрудников в рамках запуска.
 - `crawl_errors`: ошибки парсинга в рамках запуска.
 - `dataset_versions`: версии полного набора сотрудников.
 - `dataset_version_items`: состав конкретной версии набора сотрудников.
 ## Общая структура employee payload
 Краткая карточка сотрудника:
 ```json
 {
  "profile_key": "staff:avsergeev",
  "profile_id": "avsergeev",
  "full_name": "Сергеев Алексей Викторович",
  "status": "active",
  "canonical_url": "https://www.hse.ru/staff/avsergeev",
  "last_seen_at": "2026-05-14T10:00:00+00:00",
  "dismissed_at": null
 }
 ```
 В sync payload дополнительно отдается `checksum`.
 Полная карточка дополнительно содержит:
 ```json
 {
  "data": {
    "contacts": {},
    "sections": []
  }
 }
 ```
 `data` соответствует распарсенному JSON профиля сотрудника. Внутри `sections` могут быть секции с публикациями, курсами, ВКР, новостями, таблицами, ссылками и произвольными текстовыми блоками.
 Пример секции новостей внутри `data.sections`:
 ```json
 {
  "title": "В новостях",
  "slug": "v_novostyah",
  "type": "news",
  "news_count": 1,
  "news_links": [
    {
      "title": "Название новости",
      "url": "https://www.hse.ru/news/edu/1153850518.html",
      "summary": "Краткое описание новости.",
      "published_at": "2026-04-28T00:00:00+00:00",
      "published_year": 2026
    }
  ]
 }
 ```
 Для новостей отдельного MCP tool сейчас нет: они доступны через `get_employee(...).data.sections` или через полную синхронизацию `sync_employees(include_data=true)`.
 ## Tools
 ### get_service_info
 Назначение: вернуть метаданные сервиса, список tools и текущую версию набора сотрудников.
 Аргументы: отсутствуют.
 Возвращает:
 ```json
 {
  "service_name": "miem-employees",
  "backend_version": "0.7.0",
  "protocolVersion": "2024-11-05",
  "tools": [],
  "dataset": {
    "hash": "sha256",
    "previous_hash": "sha256 или null",
    "created_at": "2026-05-14T10:00:00+00:00",
    "crawl_run_id": 123,
    "employee_count": 100,
    "active_count": 95,
    "dismissed_count": 5
  }
 }
 ```
 Особенность: перед ответом сервис создает актуальную `dataset_version`, если текущий набор сотрудников еще не имеет версии.
 ### sync_employees
 Назначение: синхронизировать клиентский кэш сотрудников по hash набора данных.
 Аргументы:
 ```json
 {
  "client_hash": "sha256 или null",
  "include_data": true
 }
 ```
 - `client_hash`: hash версии, которая уже есть у клиента. Если не передан, отдается полный snapshot.
 - `include_data`: управляет включением полного `data` в карточки сотрудников. По умолчанию `true`.
 Полный ответ без `client_hash`:
 ```json
 {
  "mode": "full",
  "from_hash": null,
  "to_hash": "current-sha256",
  "dataset": {},
  "items": []
 }
 ```
 Если клиентский hash совпадает с текущим:
 ```json
 {
  "mode": "delta",
  "from_hash": "current-sha256",
  "to_hash": "current-sha256",
  "dataset": {},
  "changes": {
    "added": [],
    "updated": [],
    "dismissed": [],
    "removed": []
  }
 }
 ```
 Если `client_hash` неизвестен серверу:
 ```json
 {
  "mode": "full",
  "from_hash": "missing",
  "to_hash": "current-sha256",
  "dataset": {},
  "items": [],
  "reason": "unknown_client_hash"
 }
 ```
 Если `client_hash` найден и отличается от текущего:
 ```json
 {
  "mode": "delta",
  "from_hash": "old-sha256",
  "to_hash": "current-sha256",
  "dataset": {},
  "changes": {
    "added": [],
    "updated": [],
    "dismissed": [],
    "removed": []
  }
 }
 ```
 Логика delta:
 - `added`: сотрудник появился в новой версии.
 - `updated`: изменился checksum или статус, и сотрудник активен.
 - `dismissed`: сотрудник есть в новой версии, но получил статус `dismissed`.
 - `removed`: `profile_key` был в старой версии, но отсутствует в новой.
 Hash набора считается по отсортированному списку `{profile_key, status, checksum}`.
 ### search_employees
 Назначение: найти сотрудников по ФИО или canonical URL.
 Аргументы:
 ```json
 {
  "query": "Сергеев",
  "status": "active",
  "limit": 20
 }
 ```
 - `query`: обязательный по schema, но в коде пустая строка означает поиск без текстового фильтра.
 - `status`: опционально, только `active` или `dismissed`.
 - `limit`: максимум 100, по умолчанию 20.
 Возвращает массив кратких employee payload без `data`:
 ```json
 [
  {
    "profile_key": "staff:avsergeev",
    "profile_id": "avsergeev",
    "full_name": "Сергеев Алексей Викторович",
    "status": "active",
    "canonical_url": "https://www.hse.ru/staff/avsergeev",
    "last_seen_at": "2026-05-14T10:00:00+00:00",
    "dismissed_at": null
  }
 ]
 ```
 ### get_employee
 Назначение: получить одну карточку сотрудника.
 Аргументы:
 ```json
 {
  "profile_id_or_url": "avsergeev"
 }
 ```
 Поиск выполняется по:
 - `profile_key`
 - `profile_id`
 - точному `canonical_url`
 - частичному совпадению `canonical_url`
 Возвращает полный employee payload с `data`.
 Если сотрудник не найден:
 ```json
 {
  "error": "not_found"
 }
 ```
 ### list_employee_publications
 Назначение: вернуть публикации сотрудника. Если есть нормализованные строки в `employee_publications`, tool возвращает детальные публикационные данные: авторов, DOI, аннотацию, описание, citation text, год, тип, язык, статус и ссылки. Если детальная таблица еще не заполнена, tool использует старый fallback из `employees.current_data.sections[].publications`.
 Аргументы:
 ```json
 {
  "profile_id_or_url": "avsergeev"
 }
 ```
 Поиск сотрудника выполняется так же, как в `get_employee`: по `profile_key`, `profile_id`, точному или частичному `canonical_url`.
 Порядок источников:
 - сначала `employee_publications`, отсортированные по году, названию и внутреннему id;
 - если записей нет, секции `current_data.sections` с `type = "publications"` и массивами `publications`.
 Ответ:
 ```json
 {
  "employee": {
    "profile_key": "org_person:803294906",
    "profile_id": "803294906",
    "full_name": "Борисов Сергей Петрович",
    "status": "active",
    "canonical_url": "https://www.hse.ru/org/persons/803294906",
    "last_seen_at": "2026-05-14T10:00:00+00:00",
    "dismissed_at": null
  },
  "items": [
    {
      "id": "888959076",
      "publication_id": "888959076",
      "title": "Название публикации",
      "text": "Краткое описание или citation",
      "url": "https://publications.hse.ru/view/888959076",
      "year": 2023,
      "type": "ARTICLE",
      "publication_type": "ARTICLE",
      "language": "ru",
      "status": 1,
      "doi_url": "https://doi.org/10.53921/18195822_2023_23_4_624",
      "other_url": "https://example.test",
      "document_url": "https://example.test/file.pdf",
      "citation_text": "Авторы. Название публикации // Журнал. 2023.",
      "annotation": {
        "ru": "Аннотация",
        "en": "Abstract"
      },
      "description": {
        "main": "Авторы. Название публикации // Журнал. 2023."
      },
      "authors": [
        {
          "id": "803294906",
          "href": "https://www.hse.ru/org/persons/803294906",
          "title_ru": "Борисов С. П.",
          "title_en": "",
          "reverse_title_ru": "С. П. Борисов",
          "reverse_title_en": "",
          "alt_name": "S. P. Borisov",
          "other_name": null,
          "is_current_employee": true
        }
      ]
    }
  ]
 }
 ```
 В fallback-режиме из `current_data` старые элементы могут содержать только базовые поля `title`, `text`, `url` и `id`.
 Если сотрудник не найден:
 ```json
 {
  "items": []
 }
 ```
 Если сотрудник найден, но публикаций нет:
 ```json
 {
  "employee": {},
  "items": []
 }
 ```
 ### list_employee_courses
 Назначение: вернуть курсы преподавания сотрудника из распарсенных секций профиля.
 Аргументы:
 ```json
 {
  "profile_id_or_url": "avsergeev"
 }
 ```
 Сервис ищет секции `current_data.sections` с `type = "courses_by_year"` и объединяет массивы `courses`.
 Ответ:
 ```json
 {
  "employee": {},
  "items": [
    {
      "title": "Название курса",
      "url": "https://..."
    }
  ]
 }
 ```
 Если сотрудник или данные профиля отсутствуют:
 ```json
 {
  "items": []
 }
 ```
 ### get_crawl_status
 Назначение: вернуть последний запуск парсинга.
 Аргументы: отсутствуют.
 Ответ:
 ```json
 {
  "id": 123,
  "status": "completed",
  "source_url": "https://miem.hse.ru/persons",
  "started_at": "2026-05-14T10:00:00+00:00",
  "finished_at": "2026-05-14T10:10:00+00:00",
  "found_count": 100,
  "parsed_count": 98,
  "error_count": 2,
  "dismissed_count": 1
 }
 ```
 Если запусков еще не было:
 ```json
 {
  "status": "never_run"
 }
 ```
 ### get_crawl_run_details
 Назначение: вернуть детальную информацию по конкретному запуску парсинга: summary, изменения сотрудников и ошибки.
 Аргументы:
 ```json
 {
  "run_id": 123
 }
 ```
 Ответ:
 ```json
 {
  "id": 123,
  "source_url": "https://miem.hse.ru/persons",
  "status": "completed",
  "status_display": "Завершен",
  "started_at": "2026-05-14T10:00:00+00:00",
  "finished_at": "2026-05-14T10:10:00+00:00",
  "started_display": "14.05.2026 13:00",
  "finished_display": "14.05.2026 13:10",
  "found_count": 100,
  "parsed_count": 98,
  "new_count": 3,
  "error_count": 2,
  "dismissed_count": 1,
  "processed_count": 100,
  "progress_percent": 100.0,
  "message": null,
  "changes_detail_available": true,
  "changes": {
    "new": [],
    "missing_from_source": [],
    "dismissed": []
  },
  "errors": []
 }
 ```
 Если запуск не найден:
 ```json
 {
  "error": "not_found"
 }
 ```
 ## Примеры curl
 Список tools:
 ```bash
 curl http://localhost:8001/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
 ```
 Поиск сотрудника:
 ```bash
 curl http://localhost:8001/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"search_employees","arguments":{"query":"Сергеев","limit":5}}}'
 ```
 Полная синхронизация:
 ```bash
 curl http://localhost:8001/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"sync_employees","arguments":{"include_data":false}}}'
 ```
 Delta-синхронизация:
 ```bash
 curl http://localhost:8001/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"sync_employees","arguments":{"client_hash":"known-sha256","include_data":true}}}'
 ```
 ## Как MCP используется клиентом
 1. Клиент вызывает `initialize` и проверяет `protocolVersion`.
 2. Клиент вызывает `tools/list`, чтобы получить актуальный список tools и input schemas.
 3. Для поиска и точечных запросов клиент вызывает `tools/call` с `search_employees`, `get_employee`, `list_employee_publications`, `list_employee_courses`, `get_crawl_status` или `get_crawl_run_details`.
 4. Для локального кэша клиент вызывает `get_service_info` или `sync_employees`.
 5. Клиент хранит последний `dataset.hash`.
 6. При следующей синхронизации клиент передает hash как `client_hash`.
 7. Сервер возвращает пустую delta, delta с изменениями или полный snapshot, если hash неизвестен.
 ## Важные особенности реализации
 - MCP endpoint read-only: tools не запускают парсинг и не меняют сотрудников напрямую.
 - `get_service_info` и `sync_employees` могут создать новую запись `dataset_versions`, если состояние сотрудников изменилось и новой версии еще нет.
 - Все tool payloads возвращаются как JSON-строка внутри `content[0].text`.
 - `search_employees` ищет через `ilike` по `full_name` и `canonical_url`.
 - `get_employee` допускает частичный URL, поэтому строка `133709486` может найти `https://www.hse.ru/org/persons/133709486`.
 - Временные значения сериализуются через `isoformat()`, display-поля для админских payload формируются в часовом поясе `Europe/Moscow`.
--- a/README.md
+++ b/README.md
@@ -6,10 +6,10 @@
 - `api`: FastAPI, REST API, HTML-админка, healthcheck.
 - `worker`: weekly scheduler, который запускает парсинг по `CRAWL_CRON`.
- `mcp`: HTTP MCP endpoint с bearer token.
+- `mcp`: открытый HTTP MCP endpoint для ИИ-агентов.
 - `postgres`: основная БД.
-Парсер использует фиксированный источник сотрудников, по умолчанию `https://miem.hse.ru/persons`. Для каждой карточки сохраняются ФИО, должности, год начала работы, контакты, идентификаторы, вкладки профиля, секции, публикации, курсы, ВКР, JSON-снапшот и сжатый HTML-снапшот. Ссылки обходятся только из меню профиля самого сотрудника (`person-menu`), например `#sci`, `#teaching`, `#main`.
+Парсер использует фиксированный источник сотрудников, по умолчанию `https://miem.hse.ru/persons`. Для каждой карточки сохраняются ФИО, должности, год начала работы, контакты, идентификаторы, вкладки профиля, секции, публикации, курсы, ВКР, новости, JSON-снапшот и сжатый HTML-снапшот. Детальные публикации дополнительно нормализуются в отдельную таблицу `employee_publications`, а новости из блока «В новостях» — в `employee_news_links`. Ссылки обходятся только из меню профиля самого сотрудника (`person-menu`), например `#sci`, `#teaching`, `#main`.
 ## Переменные окружения
@@ -27,7 +27,6 @@ cp .env.example .env
 - `CRAWL_LIMIT`: опциональный лимит профилей для тестового запуска.
 - `ADMIN_USERNAME`, `ADMIN_PASSWORD`: логин и пароль админки.
 - `SESSION_SECRET`: секрет подписи cookie.
 - `MCP_TOKEN`: bearer token для `/mcp`.
 - `PARSER_USE_PLAYWRIGHT`: включение Playwright-рендера динамических вкладок.
 ## Локальный запуск
@@ -45,7 +44,6 @@ uvicorn app.main:app --reload
 - `Dashboard`: общая статистика, последний добавленный сотрудник, прогресс текущего/последнего парсинга и ручной запуск.
 - `Directory`: настраиваемая таблица сотрудников с фильтрами, сортировкой, пагинацией и выбором колонок.
 - `Employees`: простая legacy-таблица сотрудников.
 - `Runs`: история запусков, ошибки и progress bar.
 ## Docker Compose
@@ -60,7 +58,27 @@ docker compose up --build
 - MCP: `http://localhost:8001/mcp`
 - Postgres: `localhost:5432`
-Таблицы создаются приложением при старте. SQL-миграция для ручного применения лежит в `migrations/001_init.sql`.
+Таблицы создаются приложением при старте. При обновлении существующей базы приложение также добавляет недостающие runtime-колонки, например `crawl_runs.skipped_count`. SQL-миграции для ручного применения лежат в `migrations/`.
 ## Наполнение БД
 Основная карточка сотрудника хранится в `employees`: профиль, статус, даты обнаружения/увольнения, текущий JSON `current_data`, checksum и версия парсера. История успешных изменений сохраняется в `employee_snapshots` вместе с JSON-снимком и сжатым HTML профиля.
 Публикации теперь хранятся в двух видах:
 - краткий список остается внутри `employees.current_data.sections[].publications` для обратной совместимости;
 - детальные записи сохраняются в `employee_publications` и связываются с сотрудником через `employee_id`.
 `employee_publications` содержит `publication_id`, название, год, тип публикации, язык, статус, ссылку на карточку HSE Publications, DOI, внешние/document-ссылки, citation text, аннотацию, описание, авторов, raw JSON ответа `searchPubs` и `source_hash` для безопасного повторного upsert. Уникальность поддерживается по `(employee_id, publication_id)` и `(employee_id, source_hash)`, поэтому повторный crawl не должен создавать дубликаты.
 `list_employee_publications` сначала читает `employee_publications`; если детальных строк еще нет, возвращает старые публикации из `current_data`.
 Новости сотрудников также хранятся в двух видах:
 - краткий список остается внутри `employees.current_data.sections[].news_links`;
 - нормализованные карточки из вкладки «В новостях» сохраняются в `employee_news_links`.
 `employee_news_links` содержит название новости, ссылку, краткое описание, дату публикации, год публикации, raw JSON карточки и `source_hash`. Уникальность поддерживается по `(employee_id, url)` и `(employee_id, source_hash)`, поэтому повторный crawl не создает дубликаты.
 ## Парсинг
@@ -75,32 +93,43 @@ curl -X POST http://localhost:8000/api/crawl-runs --cookie "miem_admin_session=.
 - найденные сотрудники получают статус `active` и обновленный `last_seen_at`;
 - новые сотрудники добавляются в `employees`;
 - количество новых сотрудников за запуск сохраняется в `crawl_runs.new_count`;
 - публикации из HSE Publications записываются в `employee_publications`, а краткий список остается в JSON профиля;
 - новости из блока «В новостях» записываются в `employee_news_links`, а краткий список остается в JSON профиля;
 - активные сотрудники, исчезнувшие из текущего списка источника, получают статус `dismissed` и `dismissed_at`;
- каждый успешный разбор сохраняет запись в `employee_snapshots`.
+- каждый успешный новый или измененный разбор сохраняет запись в `employee_snapshots`;
 - неизмененные профили учитываются в `crawl_runs.skipped_count` и не получают новый snapshot.
-Во время выполнения парсинга `found_count`, `parsed_count` и `error_count` обновляются в базе. Админка опрашивает `/api/crawl-runs/latest` и показывает прогресс как `parsed_count + error_count / found_count`.
+Во время выполнения парсинга `found_count`, `parsed_count`, `skipped_count` и `error_count` обновляются в базе. Админка опрашивает `/api/crawl-runs/latest` и показывает прогресс как `(parsed_count + skipped_count + error_count) / found_count`.
 ## MCP
-Endpoint: `POST /mcp`, авторизация `Authorization: Bearer <MCP_TOKEN>`.
+Endpoint: `POST /mcp`, без авторизации на уровне приложения.
 Поддерживаемые tools:
 - `get_service_info()`
 - `sync_employees(client_hash?, include_data?)`
 - `search_employees(query, status?, limit?)`
 - `get_employee(profile_id_or_url)`
- `list_employee_publications(profile_id_or_url)`
+- `list_employee_publications(profile_id_or_url)` — публикации сотрудника; при наличии данных из `employee_publications` возвращает авторов, DOI, аннотацию, описание, citation text, год, тип, язык, статус и ссылку HSE Publications.
 - `list_employee_courses(profile_id_or_url)`
 - `get_crawl_status()`
 - `get_crawl_run_details(run_id)`
-Пример:
+`get_service_info` возвращает метаданные сервиса, список tools и текущую версию набора сотрудников. `sync_employees` отдает полный snapshot или delta по `client_hash`; checksum набора строится по сотрудникам, их статусам и текущим checksums. Ответы tools возвращаются как JSON-строка внутри MCP `content[0].text`.
 Новости сотрудника отдельной MCP tool не имеют: они доступны в `get_employee(...).data.sections` и `sync_employees(include_data=true)` как секция `type = "news"` с массивом `news_links`.
 Пример локального запроса списка tools:
 ```bash
 curl http://localhost:8001/mcp \
  -H "Authorization: Bearer change-me-mcp-token" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
 ```
 Если MCP нужно ограничить, делайте это на сетевом уровне: localhost binding, VPN, firewall, reverse proxy или другой внешний контур доступа.
 ## Обслуживание
 ```bash
@@ -110,4 +139,4 @@ docker compose exec postgres pg_dump -U miem miem_workers > backup.sql
 docker compose down
 ```
-Версия сервиса: `0.2.4`. Админка всегда показывает версии backend и frontend в footer.
+Версия сервиса: `0.7.0`. Админка всегда показывает версии backend и frontend в footer.
--- a/app/admin.py
+++ b/app/admin.py
@@ -1,15 +1,23 @@
 from fastapi import APIRouter, BackgroundTasks, Depends, Form, Request
 from fastapi.responses import HTMLResponse, RedirectResponse
 from fastapi.templating import Jinja2Templates
-from sqlalchemy import desc, func, or_, select
+from sqlalchemy import desc, func, select
 from sqlalchemy.orm import Session
 from app.config import Settings, get_settings
 from app.db import SessionLocal, get_db
 from app.models import CrawlError, CrawlRun, Employee
 from app.security import SESSION_COOKIE, require_admin, sign_session, verify_admin
-from app.services.admin_data import employee_detail_payload, list_employees_page, run_payload, stats_payload
+from app.services.admin_data import (
    employee_detail_payload,
    format_admin_datetime,
    list_employees_page,
    run_detail_payload,
    run_payload,
    stats_payload,
 )
 from app.services.crawl_control import get_running_run, run_crawl_if_idle
 from app.services.crawler import refresh_employee
 from app.version import BACKEND_VERSION, FRONTEND_VERSION
 router = APIRouter(prefix="/admin")
@@ -22,8 +30,9 @@ def dashboard(request: Request, db: Session = Depends(get_db), settings: Setting
    counts = stats_payload(db)
    counts["runs"] = db.scalar(select(func.count()).select_from(CrawlRun)) or 0
    counts["errors"] = db.scalar(select(func.count()).select_from(CrawlError)) or 0
-    runs = db.scalars(select(CrawlRun).order_by(desc(CrawlRun.started_at)).limit(10)).all()
+    run_models = db.scalars(select(CrawlRun).order_by(desc(CrawlRun.started_at)).limit(5)).all()
-    return _render(request, "dashboard.html", {"counts": counts, "runs": runs, "latest_run": run_payload(runs[0]) if runs else None})
+    runs = [run_payload(run) for run in run_models]
    return _render(request, "dashboard.html", {"counts": counts, "runs": runs, "latest_run": runs[0] if runs else None})
@router.get("/login", response_class=HTMLResponse)
@@ -57,18 +66,10 @@ def employees(
    request: Request,
    status: str | None = None,
    q: str | None = None,
    db: Session = Depends(get_db),
    settings: Settings = Depends(get_settings),
 ):
    require_admin(request, settings)
-    stmt = select(Employee)
+    return RedirectResponse("/admin/directory", status_code=303)
    if status:
        stmt = stmt.where(Employee.status == status)
    if q:
        pattern = f"%{q}%"
        stmt = stmt.where(or_(Employee.full_name.ilike(pattern), Employee.canonical_url.ilike(pattern)))
    items = db.scalars(stmt.order_by(Employee.full_name).limit(200)).all()
    return _render(request, "employees.html", {"employees": items, "status": status or "", "q": q or ""})
@router.get("/directory", response_class=HTMLResponse)
@@ -115,7 +116,7 @@ def directory(
                "has_email": has_email or "",
                "sort": sort,
                "direction": direction,
-                "limit": limit,
+                "limit": page["limit"],
                "offset": offset,
            },
        },
@@ -133,22 +134,65 @@ def employee_detail(
    employee = db.get(Employee, employee_id)
    if not employee:
        return RedirectResponse("/admin/employees", status_code=303)
-    snapshots = sorted(employee.snapshots, key=lambda item: item.captured_at, reverse=True)[:20]
+    snapshots = [
        {
            "captured_display": format_admin_datetime(snapshot.captured_at),
            "checksum": snapshot.checksum,
            "parser_version": snapshot.parser_version,
        }
        for snapshot in sorted(employee.snapshots, key=lambda item: item.captured_at, reverse=True)[:20]
    ]
    return _render(
        request,
        "employee_detail.html",
-        {"employee": employee, "employee_view": employee_detail_payload(employee), "snapshots": snapshots},
+        {
            "employee": employee,
            "employee_view": employee_detail_payload(employee),
            "snapshots": snapshots,
            "refresh_status": request.query_params.get("refresh_status"),
        },
    )
@router.post("/employees/{employee_id}/refresh")
 def refresh_employee_detail(
    employee_id: int,
    request: Request,
    db: Session = Depends(get_db),
    settings: Settings = Depends(get_settings),
 ):
    require_admin(request, settings)
    employee = db.get(Employee, employee_id)
    if not employee:
        return RedirectResponse("/admin/directory", status_code=303)
    run = refresh_employee(db, employee, settings)
    status = "success" if run.status == "completed" else "error"
    return RedirectResponse(f"/admin/employees/{employee_id}?refresh_status={status}", status_code=303)
@router.get("/runs", response_class=HTMLResponse)
 def runs(request: Request, db: Session = Depends(get_db), settings: Settings = Depends(get_settings)):
    require_admin(request, settings)
-    items = db.scalars(select(CrawlRun).order_by(desc(CrawlRun.started_at)).limit(50)).all()
+    run_models = db.scalars(select(CrawlRun).order_by(desc(CrawlRun.started_at)).limit(50)).all()
    items = [run_payload(run) for run in run_models]
    errors = db.scalars(select(CrawlError).order_by(desc(CrawlError.created_at)).limit(50)).all()
    return _render(request, "runs.html", {"runs": items, "errors": errors})
@router.get("/runs/{run_id}", response_class=HTMLResponse)
 def run_detail(
    run_id: int,
    request: Request,
    db: Session = Depends(get_db),
    settings: Settings = Depends(get_settings),
 ):
    require_admin(request, settings)
    run = db.get(CrawlRun, run_id)
    if not run:
        return RedirectResponse("/admin/runs", status_code=303)
    return _render(request, "run_detail.html", {"run": run_detail_payload(db, run)})
@router.post("/runs")
 def trigger_run(
    request: Request,
--- a/app/api.py
+++ b/app/api.py
@@ -8,7 +8,7 @@ from app.config import Settings, get_settings
 from app.db import SessionLocal, get_db
 from app.models import CrawlRun, Employee
 from app.security import require_admin
-from app.services.admin_data import employee_display_payload, list_employees_page, run_payload, stats_payload
+from app.services.admin_data import employee_display_payload, list_employees_page, run_detail_payload, run_payload, stats_payload
 from app.services.crawl_control import get_running_run, run_crawl_if_idle
 from app.version import BACKEND_VERSION, FRONTEND_VERSION
@@ -88,6 +88,20 @@ def latest_crawl_run(
    return {"running": run_payload(running), "latest": run_payload(latest)}
@router.get("/crawl-runs/{run_id}")
 def get_crawl_run(
    run_id: int,
    request: Request,
    db: Session = Depends(get_db),
    settings: Settings = Depends(get_settings),
 ) -> dict:
    require_admin(request, settings)
    run = db.get(CrawlRun, run_id)
    if not run:
        return {"error": "not_found"}
    return run_detail_payload(db, run) or {"error": "not_found"}
@router.post("/crawl-runs")
 def trigger_crawl(
    request: Request,
--- a/app/config.py
+++ b/app/config.py
@@ -1,5 +1,5 @@
 from functools import lru_cache
-from pydantic import Field
+from pydantic import Field, field_validator
 from pydantic_settings import BaseSettings, SettingsConfigDict
@@ -17,8 +17,13 @@ class Settings(BaseSettings):
    admin_username: str = "admin"
    admin_password: str = "admin"
    session_secret: str = Field(default="dev-session-secret", min_length=8)
    mcp_token: str = "dev-mcp-token"
    @field_validator("crawl_limit", mode="before")
    @classmethod
    def empty_crawl_limit_as_none(cls, value):
        if value == "":
            return None
        return value
@lru_cache
 def get_settings() -> Settings:
--- a/app/db.py
+++ b/app/db.py
@@ -1,6 +1,6 @@
 from collections.abc import Generator
-from sqlalchemy import create_engine
+from sqlalchemy import create_engine, inspect, text
 from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
 from app.config import get_settings
@@ -25,6 +25,28 @@ def init_db() -> None:
    import app.models  # noqa: F401
    Base.metadata.create_all(bind=engine)
    _ensure_runtime_schema()
 def _ensure_runtime_schema() -> None:
    import app.models as models
    inspector = inspect(engine)
    table_names = set(inspector.get_table_names())
    if "employees" in table_names and "employee_publications" not in table_names:
        models.EmployeePublication.__table__.create(bind=engine, checkfirst=True)
        inspector = inspect(engine)
        table_names = set(inspector.get_table_names())
    if "employees" in table_names and "employee_news_links" not in table_names:
        models.EmployeeNewsLink.__table__.create(bind=engine, checkfirst=True)
        inspector = inspect(engine)
        table_names = set(inspector.get_table_names())
    if "crawl_runs" not in table_names:
        return
    crawl_run_columns = {column["name"] for column in inspector.get_columns("crawl_runs")}
    if "skipped_count" not in crawl_run_columns:
        with engine.begin() as connection:
            connection.execute(text("ALTER TABLE crawl_runs ADD COLUMN skipped_count INTEGER NOT NULL DEFAULT 0"))
 def get_db() -> Generator[Session, None, None]:
--- a/app/mcp.py
+++ b/app/mcp.py
@@ -4,15 +4,34 @@ from fastapi import APIRouter, Depends, Request
 from sqlalchemy import desc, or_, select
 from sqlalchemy.orm import Session
 from app.config import Settings, get_settings
 from app.db import get_db
-from app.models import CrawlRun, Employee
+from app.models import CrawlRun, Employee, EmployeePublication
-from app.security import require_mcp_token
+from app.services.admin_data import run_detail_payload
 from app.services.dataset_versions import service_info_payload, sync_employees_payload
 from app.version import BACKEND_VERSION
 router = APIRouter(prefix="/mcp")
 PROTOCOL_VERSION = "2024-11-05"
 SERVICE_NAME = "miem-employees"
 TOOLS = [
    {
        "name": "get_service_info",
        "description": "Return service metadata, supported tools, and current dataset version.",
        "inputSchema": {"type": "object", "properties": {}},
    },
    {
        "name": "sync_employees",
        "description": "Synchronize employees by dataset hash. Returns a full snapshot or a delta from client_hash.",
        "inputSchema": {
            "type": "object",
            "properties": {
                "client_hash": {"type": "string"},
                "include_data": {"type": "boolean", "default": True},
            },
        },
    },
    {
        "name": "search_employees",
        "description": "Search MIEM employees by name or profile URL.",
@@ -33,7 +52,10 @@ TOOLS = [
    },
    {
        "name": "list_employee_publications",
-        "description": "List publications parsed from an employee profile.",
+        "description": (
            "List employee publications with detailed fields when available: authors, DOI URL, annotation, "
            "description, citation text, year, publication type, language, status, and HSE Publications URL."
        ),
        "inputSchema": {"type": "object", "properties": {"profile_id_or_url": {"type": "string"}}, "required": ["profile_id_or_url"]},
    },
    {
@@ -46,6 +68,15 @@ TOOLS = [
        "description": "Return the latest crawl run status.",
        "inputSchema": {"type": "object", "properties": {}},
    },
    {
        "name": "get_crawl_run_details",
        "description": "Return detailed employee changes and errors for one crawl run.",
        "inputSchema": {
            "type": "object",
            "properties": {"run_id": {"type": "integer"}},
            "required": ["run_id"],
        },
    },
 ]
@@ -53,9 +84,7 @@ TOOLS = [
 async def mcp_http(
    request: Request,
    db: Session = Depends(get_db),
    settings: Settings = Depends(get_settings),
 ) -> dict:
    require_mcp_token(request, settings)
    payload = await request.json()
    method = payload.get("method")
    request_id = payload.get("id")
@@ -64,8 +93,8 @@ async def mcp_http(
    try:
        if method == "initialize":
            result = {
-                "protocolVersion": "2024-11-05",
+                "protocolVersion": PROTOCOL_VERSION,
-                "serverInfo": {"name": "miem-employees", "version": "0.1.0"},
+                "serverInfo": {"name": SERVICE_NAME, "version": BACKEND_VERSION},
                "capabilities": {"tools": {}},
            }
        elif method == "tools/list":
@@ -80,6 +109,24 @@ async def mcp_http(
 def _call_tool(db: Session, name: str, arguments: dict) -> dict:
    if name == "get_service_info":
        return _tool_response(
            service_info_payload(
                db,
                tools=TOOLS,
                service_name=SERVICE_NAME,
                backend_version=BACKEND_VERSION,
                protocol_version=PROTOCOL_VERSION,
            )
        )
    if name == "sync_employees":
        return _tool_response(
            sync_employees_payload(
                db,
                client_hash=arguments.get("client_hash"),
                include_data=bool(arguments.get("include_data", True)),
            )
        )
    if name == "search_employees":
        return _tool_response(_search_employees(db, arguments))
    if name == "get_employee":
@@ -94,6 +141,9 @@ def _call_tool(db: Session, name: str, arguments: dict) -> dict:
    if name == "get_crawl_status":
        run = db.scalar(select(CrawlRun).order_by(desc(CrawlRun.started_at)).limit(1))
        return _tool_response(_run_payload(run) if run else {"status": "never_run"})
    if name == "get_crawl_run_details":
        run = db.get(CrawlRun, int(arguments["run_id"]))
        return _tool_response(run_detail_payload(db, run) if run else {"error": "not_found"})
    raise ValueError(f"Unknown tool: {name}")
@@ -124,8 +174,14 @@ def _find_employee(db: Session, value: str) -> Employee | None:
 def _collect_section_items(employee: Employee | None, section_type: str) -> dict:
-    if not employee or not employee.current_data:
+    if not employee:
        return {"items": []}
    if section_type == "publications":
        publications = _stored_publications(employee)
        if publications:
            return {"employee": _employee_payload(employee, include_data=False), "items": publications}
    if not employee.current_data:
        return {"employee": _employee_payload(employee, include_data=False), "items": []}
    items = []
    for section in employee.current_data.get("sections") or []:
        if section.get("type") != section_type:
@@ -137,6 +193,41 @@ def _collect_section_items(employee: Employee | None, section_type: str) -> dict
    return {"employee": _employee_payload(employee, include_data=False), "items": items}
 def _stored_publications(employee: Employee) -> list[dict]:
    return [_publication_payload(publication) for publication in sorted(employee.publications, key=_publication_sort_key)]
 def _publication_sort_key(publication: EmployeePublication) -> tuple:
    return (publication.year or 0, publication.title or "", publication.id)
 def _publication_payload(publication: EmployeePublication) -> dict:
    text = publication.citation_text or publication.title
    payload = {
        "id": publication.publication_id,
        "publication_id": publication.publication_id,
        "title": publication.title,
        "text": text,
        "url": publication.url,
    }
    optional = {
        "year": publication.year,
        "type": publication.publication_type,
        "publication_type": publication.publication_type,
        "language": publication.language,
        "status": publication.status,
        "doi_url": publication.doi_url,
        "other_url": publication.other_url,
        "document_url": publication.document_url,
        "citation_text": publication.citation_text,
        "annotation": publication.annotation,
        "description": publication.description,
        "authors": publication.authors,
    }
    payload.update({key: value for key, value in optional.items() if value not in (None, [], {})})
    return payload
 def _employee_payload(employee: Employee, include_data: bool = True) -> dict:
    payload = {
        "profile_key": employee.profile_key,
@@ -161,6 +252,7 @@ def _run_payload(run: CrawlRun) -> dict:
        "finished_at": run.finished_at.isoformat() if run.finished_at else None,
        "found_count": run.found_count,
        "parsed_count": run.parsed_count,
        "skipped_count": run.skipped_count,
        "error_count": run.error_count,
        "dismissed_count": run.dismissed_count,
    }
--- a/app/models.py
+++ b/app/models.py
@@ -41,6 +41,9 @@ class Employee(Base):
    snapshots: Mapped[list["EmployeeSnapshot"]] = relationship(back_populates="employee")
    tabs: Mapped[list["ProfileTab"]] = relationship(back_populates="employee", cascade="all, delete-orphan")
    publications: Mapped[list["EmployeePublication"]] = relationship(back_populates="employee", cascade="all, delete-orphan")
    news_links: Mapped[list["EmployeeNewsLink"]] = relationship(back_populates="employee", cascade="all, delete-orphan")
    crawl_run_changes: Mapped[list["CrawlRunEmployeeChange"]] = relationship(back_populates="employee")
 class EmployeeSnapshot(Base):
@@ -59,6 +62,68 @@ class EmployeeSnapshot(Base):
    employee: Mapped[Employee] = relationship(back_populates="snapshots")
 class EmployeePublication(Base):
    __tablename__ = "employee_publications"
    __table_args__ = (
        UniqueConstraint("employee_id", "publication_id", name="uq_employee_publications_employee_publication"),
        UniqueConstraint("employee_id", "source_hash", name="uq_employee_publications_employee_source_hash"),
        Index("ix_employee_publications_employee_id", "employee_id"),
        Index("ix_employee_publications_publication_id", "publication_id"),
        Index("ix_employee_publications_doi_url", "doi_url"),
        Index("ix_employee_publications_year", "year"),
        Index("ix_employee_publications_publication_type", "publication_type"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    employee_id: Mapped[int] = mapped_column(ForeignKey("employees.id", ondelete="CASCADE"), nullable=False)
    publication_id: Mapped[str | None] = mapped_column(String(64))
    title: Mapped[str] = mapped_column(Text, nullable=False)
    year: Mapped[int | None] = mapped_column(Integer)
    publication_type: Mapped[str | None] = mapped_column(String(64))
    language: Mapped[str | None] = mapped_column(String(16))
    status: Mapped[int | None] = mapped_column(Integer)
    url: Mapped[str | None] = mapped_column(Text)
    doi_url: Mapped[str | None] = mapped_column(Text)
    other_url: Mapped[str | None] = mapped_column(Text)
    document_url: Mapped[str | None] = mapped_column(Text)
    citation_text: Mapped[str | None] = mapped_column(Text)
    annotation: Mapped[dict | None] = mapped_column(json_type)
    description: Mapped[dict | None] = mapped_column(json_type)
    authors: Mapped[list | None] = mapped_column(json_type)
    raw_data: Mapped[dict | None] = mapped_column(json_type)
    source_hash: Mapped[str] = mapped_column(String(64), nullable=False)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow, nullable=False)
    employee: Mapped[Employee] = relationship(back_populates="publications")
 class EmployeeNewsLink(Base):
    __tablename__ = "employee_news_links"
    __table_args__ = (
        UniqueConstraint("employee_id", "url", name="uq_employee_news_links_employee_url"),
        UniqueConstraint("employee_id", "source_hash", name="uq_employee_news_links_employee_source_hash"),
        Index("ix_employee_news_links_employee_id", "employee_id"),
        Index("ix_employee_news_links_url", "url"),
        Index("ix_employee_news_links_published_at", "published_at"),
        Index("ix_employee_news_links_published_year", "published_year"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    employee_id: Mapped[int] = mapped_column(ForeignKey("employees.id", ondelete="CASCADE"), nullable=False)
    title: Mapped[str] = mapped_column(Text, nullable=False)
    url: Mapped[str | None] = mapped_column(Text)
    summary: Mapped[str | None] = mapped_column(Text)
    published_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    published_year: Mapped[int | None] = mapped_column(Integer)
    source_hash: Mapped[str] = mapped_column(String(64), nullable=False)
    raw_data: Mapped[dict | None] = mapped_column(json_type)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
    updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow, nullable=False)
    employee: Mapped[Employee] = relationship(back_populates="news_links")
 class CrawlRun(Base):
    __tablename__ = "crawl_runs"
@@ -69,11 +134,38 @@ class CrawlRun(Base):
    finished_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True))
    found_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    parsed_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    skipped_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    new_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    error_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    dismissed_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    message: Mapped[str | None] = mapped_column(Text)
    employee_changes: Mapped[list["CrawlRunEmployeeChange"]] = relationship(back_populates="crawl_run")
    dataset_versions: Mapped[list["DatasetVersion"]] = relationship(back_populates="crawl_run")
 class CrawlRunEmployeeChange(Base):
    __tablename__ = "crawl_run_employee_changes"
    __table_args__ = (
        Index("ix_crawl_run_employee_changes_run_id", "crawl_run_id"),
        Index("ix_crawl_run_employee_changes_employee_id", "employee_id"),
        Index("ix_crawl_run_employee_changes_change_type", "change_type"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    crawl_run_id: Mapped[int] = mapped_column(ForeignKey("crawl_runs.id"), nullable=False)
    employee_id: Mapped[int | None] = mapped_column(ForeignKey("employees.id"))
    profile_key: Mapped[str] = mapped_column(String(255), nullable=False)
    profile_url: Mapped[str] = mapped_column(Text, nullable=False)
    full_name: Mapped[str | None] = mapped_column(Text)
    change_type: Mapped[str] = mapped_column(String(32), nullable=False)
    profile_available: Mapped[bool | None] = mapped_column()
    message: Mapped[str | None] = mapped_column(Text)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
    crawl_run: Mapped[CrawlRun] = relationship(back_populates="employee_changes")
    employee: Mapped[Employee | None] = relationship(back_populates="crawl_run_changes")
 class CrawlError(Base):
    __tablename__ = "crawl_errors"
@@ -108,3 +200,63 @@ class ParserSource(Base):
    source_url: Mapped[str] = mapped_column(Text, nullable=False)
    enabled: Mapped[bool] = mapped_column(default=True, nullable=False)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
 class ParseResourceCache(Base):
    __tablename__ = "parse_resource_cache"
    __table_args__ = (
        UniqueConstraint("profile_key", "resource_key", "request_fingerprint", name="uq_parse_resource_cache_resource"),
        Index("ix_parse_resource_cache_profile_key", "profile_key"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    profile_key: Mapped[str] = mapped_column(String(255), nullable=False)
    resource_key: Mapped[str] = mapped_column(String(255), nullable=False)
    method: Mapped[str] = mapped_column(String(16), nullable=False)
    url: Mapped[str] = mapped_column(Text, nullable=False)
    request_fingerprint: Mapped[str] = mapped_column(String(64), nullable=False)
    etag: Mapped[str | None] = mapped_column(Text)
    last_modified: Mapped[str | None] = mapped_column(Text)
    body_hash: Mapped[str] = mapped_column(String(64), nullable=False)
    body_snapshot: Mapped[bytes] = mapped_column(LargeBinary, nullable=False)
    parser_version: Mapped[str | None] = mapped_column(String(32))
    fetched_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
 class DatasetVersion(Base):
    __tablename__ = "dataset_versions"
    __table_args__ = (
        UniqueConstraint("hash", name="uq_dataset_versions_hash"),
        Index("ix_dataset_versions_created_at", "created_at"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    hash: Mapped[str] = mapped_column(String(64), nullable=False)
    previous_hash: Mapped[str | None] = mapped_column(String(64))
    crawl_run_id: Mapped[int | None] = mapped_column(ForeignKey("crawl_runs.id"))
    employee_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    active_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    dismissed_count: Mapped[int] = mapped_column(Integer, default=0, nullable=False)
    created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, nullable=False)
    crawl_run: Mapped[CrawlRun | None] = relationship(back_populates="dataset_versions")
    items: Mapped[list["DatasetVersionItem"]] = relationship(back_populates="dataset_version", cascade="all, delete-orphan")
 class DatasetVersionItem(Base):
    __tablename__ = "dataset_version_items"
    __table_args__ = (
        UniqueConstraint("dataset_version_id", "profile_key", name="uq_dataset_version_items_version_profile"),
        Index("ix_dataset_version_items_hash", "dataset_version_id"),
        Index("ix_dataset_version_items_profile_key", "profile_key"),
    )
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    dataset_version_id: Mapped[int] = mapped_column(ForeignKey("dataset_versions.id"), nullable=False)
    profile_key: Mapped[str] = mapped_column(String(255), nullable=False)
    employee_id: Mapped[int | None] = mapped_column(ForeignKey("employees.id"))
    status: Mapped[str] = mapped_column(String(32), nullable=False)
    checksum: Mapped[str] = mapped_column(String(64), nullable=False)
    dataset_version: Mapped[DatasetVersion] = relationship(back_populates="items")
    employee: Mapped[Employee | None] = relationship()
--- a/app/parser/profile.py
+++ b/app/parser/profile.py
@@ -1,4 +1,7 @@
 import hashlib
 import json
 import re
 from datetime import datetime, timezone
 from urllib.parse import urljoin
 from bs4 import BeautifulSoup, NavigableString, Tag
@@ -99,6 +102,8 @@ def extract_person_header(soup: BeautifulSoup, source_url: str) -> dict:
 def extract_sections(soup: BeautifulSoup, source_url: str) -> list[dict]:
    sections = []
    for h2 in soup.select("h2"):
        if h2.find_parent(class_="post") or h2.find_parent(attrs={"data-tab": "press_links_news"}):
            continue
        title = normalize_ws(h2.get_text(" ", strip=True))
        if not title or "расписание занятий" in title.lower():
            continue
@@ -140,6 +145,21 @@ def extract_sections(soup: BeautifulSoup, source_url: str) -> list[dict]:
            if section_type in {"generic", "paragraphs"}:
                section["type"] = "year_blocks"
        sections.append(section)
    news_links = _parse_news_links(soup, source_url)
    if news_links:
        sections.append(
            {
                "title": "В новостях",
                "slug": "v_novostyah",
                "type": "news",
                "raw_text": "",
                "paragraphs": [],
                "items": [item["title"] for item in news_links if item.get("title")],
                "links": [{"text": item["title"], "url": item["url"]} for item in news_links if item.get("title") and item.get("url")],
                "news_count": len(news_links),
                "news_links": news_links,
            }
        )
    return sections
@@ -149,21 +169,42 @@ def parse_person_profile(
    headers: dict[str, str],
    timeout: int,
    use_playwright: bool = False,
    resource_cache=None,
 ) -> dict | None:
    normalized_url = normalize_profile_url(source_url)
    if not normalized_url:
        return None
-    response = session.get(normalized_url, headers=headers, timeout=timeout)
+    profile_type, profile_id = parse_profile_identity(normalized_url)
-    response.raise_for_status()
+    cache_profile_key = f"{profile_type}:{profile_id}"
-    html = response.text
+    resource_manifest = []
    html = _fetch_text(
        session,
        normalized_url,
        headers,
        timeout,
        resource_cache=resource_cache,
        profile_key=cache_profile_key,
        resource_key="main-html",
        resource_manifest=resource_manifest,
    )
    if use_playwright:
        html = _render_with_playwright(normalized_url, html)
    soup = BeautifulSoup(html, "html.parser")
    profile_type, profile_id = parse_profile_identity(normalized_url)
    header = extract_person_header(soup, normalized_url)
    tabs = extract_person_tabs(soup, normalized_url)
    sections = extract_sections(soup, normalized_url)
    sections = enrich_sections_from_hse_widgets(
        session,
        soup,
        normalized_url,
        headers,
        timeout,
        sections,
        resource_cache=resource_cache,
        profile_key=cache_profile_key,
        resource_manifest=resource_manifest,
    )
    internal_links = [tab["href"] for tab in tabs if tab.get("href")]
    return {
@@ -180,9 +221,49 @@ def parse_person_profile(
        "employee_internal_links": internal_links,
        "parser_version": BACKEND_VERSION,
        "_html": html,
        "_resource_manifest": resource_manifest,
    }
 def enrich_sections_from_hse_widgets(
    session: Session,
    soup: BeautifulSoup,
    source_url: str,
    headers: dict[str, str],
    timeout: int,
    sections: list[dict],
    resource_cache=None,
    profile_key: str | None = None,
    resource_manifest: list[dict] | None = None,
 ) -> list[dict]:
    enriched = list(sections)
    publications = _load_widget_publications(
        session,
        soup,
        headers,
        timeout,
        resource_cache=resource_cache,
        profile_key=profile_key,
        resource_manifest=resource_manifest,
    )
    if publications:
        enriched = _upsert_publications_section(enriched, publications)
    theses = _load_widget_graduation_theses(
        session,
        soup,
        source_url,
        headers,
        timeout,
        resource_cache=resource_cache,
        profile_key=profile_key,
        resource_manifest=resource_manifest,
    )
    if theses:
        enriched = _upsert_graduation_theses_section(enriched, theses)
    return enriched
 def _render_with_playwright(source_url: str, fallback_html: str) -> str:
    try:
        from playwright.sync_api import sync_playwright
@@ -206,6 +287,161 @@ def _render_with_playwright(source_url: str, fallback_html: str) -> str:
        return fallback_html
 def _load_widget_publications(
    session: Session,
    soup: BeautifulSoup,
    headers: dict[str, str],
    timeout: int,
    *,
    resource_cache=None,
    profile_key: str | None = None,
    resource_manifest: list[dict] | None = None,
 ) -> list[dict]:
    script = soup.select_one('script[data-widget-name="AuthorSearch"][data-author]')
    if not script:
        return []
    author_id = normalize_ws(script.get("data-author"))
    if not author_id:
        return []
    publications = []
    page_id = 1
    per_page = 100
    while page_id <= 20:
        payload = {
            "type": "ANY",
            "filterParams": (
                f'"acceptLanguage":"ru"|"fullTextPublicEnabled": 1|'
                f'"pubsAuthor": {author_id}|"widgetName": "AuthorSearch"'
            ),
            "paginationParams": {
                "publsSort": ["TITLE_ASC"],
                "publsCount": per_page,
                "pageId": page_id,
            },
        }
        try:
            if resource_cache and profile_key:
                text = _fetch_text(
                    session,
                    "https://publications.hse.ru/api/searchPubs",
                    headers,
                    timeout,
                    resource_cache=resource_cache,
                    profile_key=profile_key,
                    resource_key=f"publications-page-{page_id}",
                    resource_manifest=resource_manifest,
                    method="POST",
                    json_payload=payload,
                )
                data = json.loads(text)
            else:
                response = session.post(
                    "https://publications.hse.ru/api/searchPubs",
                    json=payload,
                    headers=headers,
                    timeout=timeout,
                )
                response.raise_for_status()
                data = response.json()
        except Exception:
            return publications
        result = data.get("result") if isinstance(data, dict) else {}
        items = _extract_publication_items(result)
        if not items:
            break
        publications.extend(_normalize_publication_item(item, author_id) for item in items)
        total = int(result.get("total") or 0)
        if not result.get("more") and len(publications) >= total:
            break
        page_id += 1
    return _dedupe_publications(publications)
 def _extract_publication_items(result: object) -> list[dict]:
    if not isinstance(result, dict):
        return []
    return _flatten_publication_items(result.get("items"))
 def _flatten_publication_items(value: object) -> list[dict]:
    if isinstance(value, list):
        return [item for item in value if _is_publication_item(item)]
    if not isinstance(value, dict):
        return []
    nested_items = value.get("items")
    if isinstance(nested_items, list):
        return [item for item in nested_items if _is_publication_item(item)]
    if isinstance(nested_items, dict):
        return _flatten_publication_items(nested_items)
    publications = []
    for child in value.values():
        publications.extend(_flatten_publication_items(child))
    return publications
 def _is_publication_item(value: object) -> bool:
    return isinstance(value, dict) and ("id" in value or "title" in value)
 def _load_widget_graduation_theses(
    session: Session,
    soup: BeautifulSoup,
    source_url: str,
    headers: dict[str, str],
    timeout: int,
    *,
    resource_cache=None,
    profile_key: str | None = None,
    resource_manifest: list[dict] | None = None,
 ) -> list[dict]:
    script = soup.select_one('script[src*="/n/stat/vkr/app.js"][data-person-id]')
    if not script:
        return []
    person_id = normalize_ws(script.get("data-person-id"))
    api_url = normalize_ws(script.get("data-api-url")) or "/n/vkr/api/"
    if not person_id:
        return []
    request_headers = {**headers, "x-portal-language": "ru"}
    try:
        url = urljoin(source_url, api_url)
        params = {"supervisorId": person_id}
        if resource_cache and profile_key:
            text = _fetch_text(
                session,
                url,
                request_headers,
                timeout,
                resource_cache=resource_cache,
                profile_key=profile_key,
                resource_key="graduation-theses",
                resource_manifest=resource_manifest,
                params=params,
            )
            data = json.loads(text)
        else:
            response = session.get(
                url,
                params=params,
                headers=request_headers,
                timeout=timeout,
            )
            response.raise_for_status()
            data = response.json()
    except Exception:
        return []
    items = data.get("data") if isinstance(data, dict) else []
    if not isinstance(items, list):
        return []
    return [_normalize_vkr_item(item, source_url) for item in items if isinstance(item, dict)]
 def _collect_between_h2(start_h2: Tag) -> list[Tag | NavigableString | str]:
    nodes = []
    for sibling in start_h2.next_siblings:
@@ -256,7 +492,7 @@ def _infer_section_type(title: str, nodes: list) -> str:
    lowered = title.lower()
    if _has_table(nodes):
        return "table"
-    if "публикац" in lowered:
+    if _is_publications_title(lowered):
        return "publications"
    if "учебные курсы" in lowered:
        return "courses_by_year"
@@ -267,6 +503,10 @@ def _infer_section_type(title: str, nodes: list) -> str:
    return "generic"
 def _is_publications_title(lowered_title: str) -> bool:
    return lowered_title.startswith("публикац")
 def _has_table(nodes: list) -> bool:
    return any(isinstance(node, Tag) and (node.name == "table" or node.find("table")) for node in nodes)
@@ -353,6 +593,296 @@ def _parse_vkr_items(nodes: list) -> list[str]:
    return [item for item in dict.fromkeys(items) if item]
 def _parse_news_links(soup: BeautifulSoup, source_url: str) -> list[dict]:
    news = []
    for post in soup.select('[data-tab="press_links_news"] .post'):
        if not isinstance(post, Tag):
            continue
        anchor = post.select_one(".post__content h2 a[href], h2 a[href], a[href]")
        title = normalize_ws(anchor.get_text(" ", strip=True)) if anchor else ""
        href = normalize_ws(anchor.get("href")) if anchor else ""
        summary_node = post.select_one(".post__text")
        summary = normalize_ws(summary_node.get_text(" ", strip=True)) if summary_node else ""
        published_at = _parse_post_date(post)
        if not title and not href:
            continue
        item = {
            "title": title or href,
            "url": urljoin(source_url, href) if href else None,
            "summary": summary or None,
            "published_at": published_at.isoformat() if published_at else None,
            "published_year": published_at.year if published_at else _int_or_none(normalize_ws(_select_text(post, ".post-meta__year"))),
            "raw_data": {
                "title": title or href,
                "url": href or None,
                "summary": summary or None,
                "date_text": normalize_ws(_select_text(post, ".post-meta__date")),
            },
        }
        news.append(item)
    return _dedupe_news_links(news)
 def _select_text(node: Tag, selector: str) -> str:
    selected = node.select_one(selector)
    return selected.get_text(" ", strip=True) if selected else ""
 def _parse_post_date(post: Tag) -> datetime | None:
    day = _int_or_none(normalize_ws(_select_text(post, ".post-meta__day")))
    month = _month_number(normalize_ws(_select_text(post, ".post-meta__month")))
    year = _int_or_none(normalize_ws(_select_text(post, ".post-meta__year")))
    if not day or not month or not year:
        return None
    try:
        return datetime(year, month, day, tzinfo=timezone.utc)
    except ValueError:
        return None
 def _month_number(value: str) -> int | None:
    lowered = value.lower().strip(".")
    months = {
        "янв": 1,
        "январь": 1,
        "января": 1,
        "фев": 2,
        "февр": 2,
        "февраль": 2,
        "февраля": 2,
        "март": 3,
        "мар": 3,
        "марта": 3,
        "апр": 4,
        "апрель": 4,
        "апреля": 4,
        "май": 5,
        "мая": 5,
        "июнь": 6,
        "июня": 6,
        "июль": 7,
        "июля": 7,
        "авг": 8,
        "август": 8,
        "августа": 8,
        "сент": 9,
        "сен": 9,
        "сентябрь": 9,
        "сентября": 9,
        "окт": 10,
        "октябрь": 10,
        "октября": 10,
        "нояб": 11,
        "ноябрь": 11,
        "ноября": 11,
        "дек": 12,
        "декабрь": 12,
        "декабря": 12,
    }
    return months.get(lowered)
 def _normalize_publication_item(item: dict, current_author_id: str | None = None) -> dict:
    publication_id = str(item.get("id") or "").strip()
    title = _html_to_text(item.get("title"))
    year = _int_or_none(item.get("year"))
    publication_type = str(item.get("type") or "").strip() or None
    description = item.get("description") if isinstance(item.get("description"), dict) else {}
    short_description = _localized_value(description.get("short")) or _localized_value(description.get("shortLeft"))
    documents = item.get("documents") if isinstance(item.get("documents"), dict) else {}
    language = item.get("language") if isinstance(item.get("language"), dict) else {}
    annotation = _localized_text_map(item.get("annotation"))
    authors = _normalize_publication_authors(item.get("authorsByType"), current_author_id)
    citation_text = normalize_ws(str(description.get("main") or "")) or _build_publication_citation(title, authors, year)
    text = normalize_ws(" ".join(part for part in [title, str(year or ""), short_description] if part))
    return {
        "id": publication_id or None,
        "publication_id": publication_id or None,
        "title": title or publication_id,
        "year": year,
        "type": publication_type,
        "publication_type": publication_type,
        "language": normalize_ws(language.get("name")) or None,
        "status": _int_or_none(item.get("status")),
        "url": f"https://publications.hse.ru/view/{publication_id}" if publication_id else None,
        "doi_url": _document_href(documents, "DOI"),
        "other_url": _document_href(documents, "OTHER_URL"),
        "document_url": _document_href(documents, "DOCUMENT"),
        "citation_text": citation_text or None,
        "annotation": annotation,
        "description": description or None,
        "authors": authors,
        "raw_data": item,
        "text": text or title or publication_id,
    }
 def _normalize_vkr_item(item: dict, source_url: str) -> dict:
    thesis_id = item.get("id")
    program = item.get("learnProgram") if isinstance(item.get("learnProgram"), dict) else {}
    org_unit = item.get("orgUnit") if isinstance(item.get("orgUnit"), dict) else {}
    supervisors = []
    for supervisor in item.get("supervisors") or []:
        if not isinstance(supervisor, dict):
            continue
        name = normalize_ws(supervisor.get("name"))
        url = normalize_ws(supervisor.get("url"))
        if name or url:
            supervisors.append({"name": name or url, "url": url or None})
    return {
        "id": thesis_id,
        "student": normalize_ws(item.get("student")),
        "title": normalize_ws(item.get("title")),
        "defense_year": item.get("year"),
        "level": normalize_ws(item.get("level")),
        "rating": item.get("rating"),
        "project_url": urljoin(source_url, f"/edu/vkr/{thesis_id}") if thesis_id else None,
        "program": normalize_ws(program.get("title")),
        "program_url": urljoin(source_url, program.get("url")) if program.get("url") else None,
        "org_unit": normalize_ws(org_unit.get("title")),
        "org_unit_url": urljoin(source_url, org_unit.get("url")) if org_unit.get("url") else None,
        "supervisors": supervisors,
        "text": normalize_ws(" ".join(str(part) for part in [item.get("student"), item.get("title"), item.get("year")] if part)),
    }
 def _upsert_publications_section(sections: list[dict], publications: list[dict]) -> list[dict]:
    merged = []
    inserted = False
    for section in sections:
        if section.get("type") != "publications":
            merged.append(section)
            continue
        existing = section.get("publications") or []
        section = {
            **section,
            "publications_count": max(section.get("publications_count") or 0, len(publications)),
            "publications": _dedupe_publications([*existing, *publications]),
        }
        section["items"] = [item["text"] for item in section["publications"] if item.get("text")]
        merged.append(section)
        inserted = True
    if not inserted:
        merged.append(
            {
                "title": "Публикации и исследования",
                "slug": "publikacii_i_issledovaniya",
                "type": "publications",
                "raw_text": "",
                "paragraphs": [],
                "items": [item["text"] for item in publications if item.get("text")],
                "links": [],
                "publications_count": len(publications),
                "publications": publications,
            }
        )
    return merged
 def _upsert_graduation_theses_section(sections: list[dict], theses: list[dict]) -> list[dict]:
    section = {
        "title": "Выпускные квалификационные работы студентов НИУ ВШЭ",
        "slug": "vypusknye_kvalifikacionnye_raboty_studentov_niu_vshe",
        "type": "graduation_theses",
        "raw_text": "",
        "paragraphs": [],
        "items": [item["text"] for item in theses if item.get("text")],
        "links": [{"text": item["title"], "url": item["project_url"]} for item in theses if item.get("title") and item.get("project_url")],
        "theses_count": len(theses),
        "theses": theses,
    }
    return [item for item in sections if item.get("type") != "graduation_theses"] + [section]
 def _dedupe_publications(items: list[dict]) -> list[dict]:
    seen = set()
    unique = []
    for item in items:
        key = item.get("id") or item.get("url") or item.get("title")
        if key and key not in seen:
            seen.add(key)
            unique.append(item)
    return unique
 def _dedupe_news_links(items: list[dict]) -> list[dict]:
    seen = set()
    unique = []
    for item in items:
        key = item.get("url") or item.get("title")
        if key and key not in seen:
            seen.add(key)
            unique.append(item)
    return unique
 def _html_to_text(value: object) -> str:
    return normalize_ws(BeautifulSoup(str(value or ""), "html.parser").get_text(" ", strip=True))
 def _localized_text_map(value: object) -> dict[str, str]:
    if not isinstance(value, dict):
        return {}
    localized = {}
    for key in ("ru", "en", "publ"):
        text = _html_to_text(value.get(key))
        if text:
            localized[key] = text
    return localized
 def _localized_value(value: object) -> str:
    if isinstance(value, dict):
        return normalize_ws(value.get("ru") or value.get("publ") or value.get("en"))
    return normalize_ws(str(value or ""))
 def _normalize_publication_authors(value: object, current_author_id: str | None) -> list[dict]:
    if not isinstance(value, dict):
        return []
    authors = []
    for author in value.get("author") or []:
        if not isinstance(author, dict):
            continue
        title = author.get("title") if isinstance(author.get("title"), dict) else {}
        reverse_title = author.get("reverseTitle") if isinstance(author.get("reverseTitle"), dict) else {}
        author_id = normalize_ws(author.get("id"))
        href = normalize_ws(author.get("href"))
        authors.append(
            {
                "id": author_id or None,
                "href": urljoin("https://www.hse.ru", href) if href else None,
                "title_ru": _html_to_text(title.get("ru")),
                "title_en": _html_to_text(title.get("en")),
                "reverse_title_ru": _html_to_text(reverse_title.get("ru")),
                "reverse_title_en": _html_to_text(reverse_title.get("en")),
                "alt_name": normalize_ws(author.get("altName")) or None,
                "other_name": normalize_ws(author.get("otherName")) or None,
                "is_current_employee": bool(current_author_id and author_id == current_author_id),
            }
        )
    return authors
 def _document_href(documents: dict, key: str) -> str | None:
    document = documents.get(key)
    if not isinstance(document, dict):
        return None
    return normalize_ws(document.get("href")) or None
 def _build_publication_citation(title: str, authors: list[dict], year: int | None) -> str:
    author_names = [author.get("title_ru") or author.get("title_en") or author.get("alt_name") for author in authors]
    return normalize_ws(". ".join(part for part in [", ".join(filter(None, author_names)), title, str(year or "")] if part))
 def _int_or_none(value: object) -> int | None:
    try:
        return int(value)
    except (TypeError, ValueError):
        return None
 def _slugify(value: str) -> str:
    cleaned = re.sub(r"[^\w\s-]", "", value.lower(), flags=re.UNICODE)
    return re.sub(r"[-\s]+", "_", cleaned).strip("_") or "section"
@@ -378,3 +908,62 @@ def _dedupe_dicts(items: list[dict]) -> list[dict]:
            seen.add(key)
            unique.append(item)
    return unique
 def _fetch_text(
    session: Session,
    url: str,
    headers: dict[str, str],
    timeout: int,
    *,
    resource_cache=None,
    profile_key: str | None = None,
    resource_key: str,
    resource_manifest: list[dict] | None,
    method: str = "GET",
    json_payload: object | None = None,
    params: dict | None = None,
 ) -> str:
    if resource_cache and profile_key:
        cached = resource_cache.fetch_text(
            session,
            profile_key=profile_key,
            resource_key=resource_key,
            method=method,
            url=url,
            headers=headers,
            timeout=timeout,
            json_payload=json_payload,
            params=params,
        )
        if resource_manifest is not None:
            resource_manifest.append(
                {
                    "resource_key": resource_key,
                    "method": method,
                    "url": url,
                    "body_hash": cached.body_hash,
                    "from_cache": cached.from_cache,
                    "status_code": cached.status_code,
                }
            )
        return cached.text
    if method.upper() == "POST":
        response = session.post(url, json=json_payload, headers=headers, timeout=timeout, params=params)
    else:
        response = session.get(url, headers=headers, timeout=timeout, params=params)
    response.raise_for_status()
    text = response.text
    if resource_manifest is not None:
        resource_manifest.append(
            {
                "resource_key": resource_key,
                "method": method,
                "url": url,
                "body_hash": hashlib.sha256(text.encode("utf-8")).hexdigest(),
                "from_cache": False,
                "status_code": response.status_code,
            }
        )
    return text
--- a/app/security.py
+++ b/app/security.py
@@ -44,9 +44,3 @@ def require_admin(request: Request, settings: Settings) -> str:
    if not username:
        raise HTTPException(status_code=status.HTTP_303_SEE_OTHER, headers={"Location": "/admin/login"})
    return username
 def require_mcp_token(request: Request, settings: Settings) -> None:
    auth = request.headers.get("authorization", "")
    if not auth.startswith("Bearer ") or not hmac.compare_digest(auth.removeprefix("Bearer ").strip(), settings.mcp_token):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid MCP token")
--- a/app/services/admin_data.py
+++ b/app/services/admin_data.py
@@ -3,11 +3,12 @@ from __future__ import annotations
 from datetime import date, datetime, time
 from math import ceil
 from typing import Any
 from zoneinfo import ZoneInfo
 from sqlalchemy import Select, Text, and_, desc, func, or_, select
 from sqlalchemy.orm import Session
-from app.models import CrawlRun, Employee
+from app.models import CrawlError, CrawlRun, CrawlRunEmployeeChange, Employee, EmployeeNewsLink
 EMPLOYEE_SORTS = {
    "full_name": Employee.full_name,
@@ -23,6 +24,7 @@ def employee_display_payload(employee: Employee) -> dict[str, Any]:
    data = _as_dict(employee.current_data)
    contacts = _as_dict(data.get("contacts"))
    sections = _as_list(data.get("sections"))
    stored_news_links = _stored_news_links(employee)
    positions = _clean_list(data.get("positions"))
    emails = _clean_list(contacts.get("emails"))
    phones = _clean_list(contacts.get("phones"))
@@ -30,6 +32,7 @@ def employee_display_payload(employee: Employee) -> dict[str, Any]:
        "id": employee.id,
        "full_name": employee.full_name,
        "status": employee.status,
        "status_display": _employee_status_display(employee.status),
        "canonical_url": employee.canonical_url,
        "positions": positions,
        "positions_text": "; ".join(positions),
@@ -41,9 +44,13 @@ def employee_display_payload(employee: Employee) -> dict[str, Any]:
        "address": contacts.get("address"),
        "publications_count": _count_section_items(sections, "publications"),
        "courses_count": _count_section_items(sections, "courses_by_year"),
        "news_count": len(stored_news_links) or _count_section_items(sections, "news"),
        "first_seen_at": employee.first_seen_at.isoformat() if employee.first_seen_at else None,
        "last_seen_at": employee.last_seen_at.isoformat() if employee.last_seen_at else None,
        "dismissed_at": employee.dismissed_at.isoformat() if employee.dismissed_at else None,
        "first_seen_display": format_admin_datetime(employee.first_seen_at),
        "last_seen_display": format_admin_datetime(employee.last_seen_at),
        "dismissed_display": format_admin_datetime(employee.dismissed_at),
    }
@@ -62,6 +69,7 @@ def employee_detail_payload(employee: Employee) -> dict[str, Any]:
            "contact_items": _normalize_contact_items(contacts.get("items")),
        },
        "external_ids": _normalize_external_ids(data.get("external_ids")),
        "news_links": _detail_news_links(employee, data),
        "sections": [_normalize_section(section) for section in _as_list(data.get("sections"))],
    }
@@ -107,7 +115,7 @@ def list_employees_page(
    limit: int = 50,
    offset: int = 0,
 ) -> dict[str, Any]:
-    limit = max(1, min(limit, 200))
+    limit = limit if limit in {25, 50, 100} else 50
    offset = max(0, offset)
    base_stmt = build_employee_query(
        status=status,
@@ -121,7 +129,7 @@ def list_employees_page(
    order = desc(sort_column) if direction == "desc" else sort_column
    employees = db.scalars(base_stmt.order_by(order).limit(limit).offset(offset)).all()
    return {
-        "items": [employee_display_payload(employee) for employee in employees],
+        "employees": [employee_display_payload(employee) for employee in employees],
        "total": total,
        "limit": limit,
        "offset": offset,
@@ -148,16 +156,20 @@ def stats_payload(db: Session) -> dict[str, Any]:
 def run_payload(run: CrawlRun | None) -> dict[str, Any] | None:
    if not run:
        return None
-    processed = run.parsed_count + run.error_count
+    processed = run.parsed_count + run.skipped_count + run.error_count
    percent = round((processed / run.found_count) * 100, 1) if run.found_count else 0
    return {
        "id": run.id,
        "source_url": run.source_url,
        "status": run.status,
        "status_display": _run_status_display(run.status),
        "started_at": run.started_at.isoformat() if run.started_at else None,
        "finished_at": run.finished_at.isoformat() if run.finished_at else None,
        "started_display": format_admin_datetime(run.started_at),
        "finished_display": format_admin_datetime(run.finished_at),
        "found_count": run.found_count,
        "parsed_count": run.parsed_count,
        "skipped_count": run.skipped_count,
        "new_count": run.new_count,
        "error_count": run.error_count,
        "dismissed_count": run.dismissed_count,
@@ -167,6 +179,97 @@ def run_payload(run: CrawlRun | None) -> dict[str, Any] | None:
    }
 def run_detail_payload(db: Session, run: CrawlRun | None) -> dict[str, Any] | None:
    if not run:
        return None
    changes = db.scalars(
        select(CrawlRunEmployeeChange)
        .where(CrawlRunEmployeeChange.crawl_run_id == run.id)
        .order_by(CrawlRunEmployeeChange.created_at, CrawlRunEmployeeChange.id)
    ).all()
    errors = db.scalars(select(CrawlError).where(CrawlError.crawl_run_id == run.id).order_by(CrawlError.created_at)).all()
    grouped_changes = {"new": [], "missing_from_source": [], "dismissed": []}
    for change in changes:
        grouped_changes.setdefault(change.change_type, []).append(_change_payload(change))
    return {
        **(run_payload(run) or {}),
        "changes_detail_available": bool(changes),
        "changes": grouped_changes,
        "errors": [_crawl_error_payload(error) for error in errors],
    }
 def format_admin_datetime(value: Any) -> str:
    if not value:
        return "Не указано"
    if isinstance(value, str):
        try:
            value = datetime.fromisoformat(value.replace("Z", "+00:00"))
        except ValueError:
            return value
    if not isinstance(value, datetime):
        return str(value)
    if value.tzinfo:
        value = value.astimezone(ZoneInfo("Europe/Moscow"))
    return value.strftime("%d.%m.%Y %H:%M")
 def _employee_status_display(status: str | None) -> str:
    labels = {"active": "Работает", "dismissed": "Уволен"}
    return labels.get(status or "", status or "Не указано")
 def _run_status_display(status: str | None) -> str:
    labels = {"running": "Выполняется", "completed": "Завершен", "failed": "Ошибка"}
    return labels.get(status or "", status or "Не указано")
 def _change_payload(change: CrawlRunEmployeeChange) -> dict[str, Any]:
    return {
        "id": change.id,
        "employee_id": change.employee_id,
        "profile_key": change.profile_key,
        "profile_url": change.profile_url,
        "full_name": change.full_name,
        "change_type": change.change_type,
        "change_type_display": _change_type_display(change.change_type),
        "profile_available": change.profile_available,
        "profile_available_display": _profile_available_display(change.profile_available),
        "message": change.message,
        "created_at": change.created_at.isoformat() if change.created_at else None,
        "created_display": format_admin_datetime(change.created_at),
    }
 def _crawl_error_payload(error: CrawlError) -> dict[str, Any]:
    return {
        "id": error.id,
        "crawl_run_id": error.crawl_run_id,
        "profile_url": error.profile_url,
        "error_type": error.error_type,
        "message": error.message,
        "created_at": error.created_at.isoformat() if error.created_at else None,
        "created_display": format_admin_datetime(error.created_at),
    }
 def _change_type_display(change_type: str | None) -> str:
    labels = {
        "new": "Новый",
        "missing_from_source": "Потеряшка",
        "dismissed": "Уволен",
    }
    return labels.get(change_type or "", change_type or "Не указано")
 def _profile_available_display(value: bool | None) -> str:
    if value is True:
        return "Профиль доступен"
    if value is False:
        return "Профиль недоступен"
    return "Не проверялось"
 def _count_section_items(sections: list[dict[str, Any]], section_type: str) -> int:
    total = 0
    for section in sections:
@@ -176,6 +279,8 @@ def _count_section_items(sections: list[dict[str, Any]], section_type: str) -> i
            total += len(section.get("publications") or section.get("items") or [])
        elif section_type == "courses_by_year":
            total += len(section.get("courses") or [])
        elif section_type == "news":
            total += len(section.get("news_links") or section.get("items") or [])
    return total
@@ -243,11 +348,15 @@ def _normalize_section(section: Any) -> dict[str, Any]:
        "type": section_type,
        "raw_text": raw_text,
        "paragraphs": paragraphs,
-        "items": items,
+        "list_items": items,
        "links": _normalize_links(section.get("links")),
        "year_entries": _normalize_year_entries(section.get("year_entries")),
        "publications": _normalize_publications(section.get("publications")),
        "publications_count": section.get("publications_count"),
        "news_links": _normalize_news_links(section.get("news_links")),
        "news_count": section.get("news_count"),
        "theses": _normalize_theses(section.get("theses")),
        "theses_count": section.get("theses_count"),
        "academic_year": section.get("academic_year"),
        "courses": _normalize_courses(section.get("courses")),
        "table": _normalize_table(section.get("table")),
@@ -268,6 +377,77 @@ def _normalize_links(items: Any) -> list[dict[str, str | None]]:
    return normalized
 def _stored_news_links(employee: Employee) -> list[dict[str, Any]]:
    return [_stored_news_link_payload(item) for item in sorted(employee.news_links, key=_news_link_sort_key)]
 def _news_link_sort_key(item: EmployeeNewsLink) -> tuple:
    timestamp = item.published_at.timestamp() if item.published_at else 0
    return (-timestamp, item.title or "", item.id)
 def _stored_news_link_payload(item: EmployeeNewsLink) -> dict[str, Any]:
    return {
        "title": item.title,
        "url": item.url,
        "summary": item.summary,
        "published_at": item.published_at.isoformat() if item.published_at else None,
        "published_year": item.published_year,
        "published_display": format_admin_date(item.published_at) if item.published_at else str(item.published_year or ""),
    }
 def _detail_news_links(employee: Employee, data: dict[str, Any]) -> list[dict[str, Any]]:
    stored = _stored_news_links(employee)
    if stored:
        return stored
    for section in _as_list(data.get("sections")):
        if isinstance(section, dict) and section.get("type") == "news":
            return _normalize_news_links(section.get("news_links"))
    return []
 def format_admin_date(value: Any) -> str:
    if not value:
        return ""
    if isinstance(value, str):
        try:
            value = datetime.fromisoformat(value.replace("Z", "+00:00"))
        except ValueError:
            return value
    if not isinstance(value, datetime):
        return str(value)
    if value.tzinfo:
        value = value.astimezone(ZoneInfo("Europe/Moscow"))
    return value.strftime("%d.%m.%Y")
 def _normalize_news_links(items: Any) -> list[dict[str, Any]]:
    normalized = []
    if not isinstance(items, list):
        return normalized
    for item in items:
        if not isinstance(item, dict):
            continue
        title = str(item.get("title") or item.get("url") or "").strip()
        url = str(item.get("url") or "").strip()
        summary = str(item.get("summary") or "").strip()
        published_at = str(item.get("published_at") or "").strip()
        published_year = item.get("published_year")
        if title or url:
            normalized.append(
                {
                    "title": title or url,
                    "url": url or None,
                    "summary": summary or None,
                    "published_at": published_at or None,
                    "published_year": published_year,
                    "published_display": format_admin_date(published_at) if published_at else str(published_year or ""),
                }
            )
    return normalized
 def _normalize_year_entries(items: Any) -> list[dict[str, Any]]:
    normalized = []
    if not isinstance(items, list):
@@ -316,6 +496,35 @@ def _normalize_courses(items: Any) -> list[dict[str, str | None]]:
    return normalized
 def _normalize_theses(items: Any) -> list[dict[str, Any]]:
    normalized = []
    if not isinstance(items, list):
        return normalized
    for item in items:
        if not isinstance(item, dict):
            continue
        title = str(item.get("title") or "").strip()
        student = str(item.get("student") or "").strip()
        if not title and not student:
            continue
        normalized.append(
            {
                "id": item.get("id"),
                "student": student,
                "title": title,
                "defense_year": item.get("defense_year") or item.get("year"),
                "level": str(item.get("level") or "").strip(),
                "rating": item.get("rating"),
                "project_url": str(item.get("project_url") or "").strip() or None,
                "program": str(item.get("program") or "").strip(),
                "program_url": str(item.get("program_url") or "").strip() or None,
                "org_unit": str(item.get("org_unit") or "").strip(),
                "org_unit_url": str(item.get("org_unit_url") or "").strip() or None,
            }
        )
    return normalized
 def _normalize_table(table: Any) -> dict[str, Any] | None:
    if not isinstance(table, dict):
        return None
--- a/app/services/crawler.py
+++ b/app/services/crawler.py
@@ -1,18 +1,31 @@
 import gzip
 import hashlib
 import json
 import re
 import time
 from datetime import datetime, timezone
 import requests
-from sqlalchemy import select
+from sqlalchemy import inspect, select
 from sqlalchemy.orm import Session
 from app.config import Settings
-from app.models import CrawlError, CrawlRun, Employee, EmployeeSnapshot, ParserSource, ProfileTab
+from app.models import (
    CrawlError,
    CrawlRun,
    CrawlRunEmployeeChange,
    Employee,
    EmployeeNewsLink,
    EmployeePublication,
    EmployeeSnapshot,
    ParserSource,
    ProfileTab,
 )
 from app.parser.collector import collect_profile_links
 from app.parser.profile import parse_person_profile
 from app.parser.profile_url import profile_key
 from app.services.dataset_versions import get_or_create_current_version
 from app.services.resource_cache import ResourceCache
 HEADERS = {
    "User-Agent": "Mozilla/5.0 (compatible; MIEMEmployeesBot/0.1.0; +https://miem.hse.ru/)"
@@ -28,8 +41,10 @@ def run_crawl(db: Session, settings: Settings) -> CrawlRun:
    found_keys: set[str] = set()
    parsed_count = 0
    skipped_count = 0
    try:
        with requests.Session() as session:
            resource_cache = ResourceCache(db)
            urls = collect_profile_links(session, source.source_url, HEADERS, settings.request_timeout)
            if settings.crawl_limit:
                urls = urls[: settings.crawl_limit]
@@ -47,12 +62,17 @@ def run_crawl(db: Session, settings: Settings) -> CrawlRun:
                        HEADERS,
                        settings.request_timeout,
                        settings.parser_use_playwright,
                        resource_cache=resource_cache,
                    )
                    if not parsed:
                        continue
-                    _upsert_employee(db, run, parsed)
+                    _, changed = _upsert_employee(db, run, parsed)
                    if changed:
                        parsed_count += 1
                    else:
                        skipped_count += 1
                    run.parsed_count = parsed_count
                    run.skipped_count = skipped_count
                    db.commit()
                except Exception as exc:
                    run.error_count += 1
@@ -68,8 +88,9 @@ def run_crawl(db: Session, settings: Settings) -> CrawlRun:
                finally:
                    time.sleep(settings.request_delay_seconds)
-        run.dismissed_count = _mark_dismissed(db, found_keys)
+            run.dismissed_count = _mark_dismissed(db, run, found_keys, session, settings.request_timeout)
        run.status = "completed"
        get_or_create_current_version(db, crawl_run_id=run.id)
    except Exception as exc:
        run.status = "failed"
        run.message = str(exc)
@@ -80,6 +101,54 @@ def run_crawl(db: Session, settings: Settings) -> CrawlRun:
    return run
 def refresh_employee(db: Session, employee: Employee, settings: Settings) -> CrawlRun:
    run = CrawlRun(source_url=employee.canonical_url, status="running", found_count=1)
    db.add(run)
    db.commit()
    db.refresh(run)
    try:
        with requests.Session() as session:
            resource_cache = ResourceCache(db)
            parsed = parse_person_profile(
                session,
                employee.canonical_url,
                HEADERS,
                settings.request_timeout,
                settings.parser_use_playwright,
                resource_cache=resource_cache,
            )
        if not parsed:
            raise ValueError("Профиль не удалось распарсить.")
        if _parsed_profile_key(parsed) != employee.profile_key:
            raise ValueError("Распарсенный профиль не совпадает с обновляемым сотрудником.")
        _, changed = _upsert_employee(db, run, parsed)
        if changed:
            run.parsed_count = 1
        else:
            run.skipped_count = 1
        run.status = "completed"
        get_or_create_current_version(db, crawl_run_id=run.id)
    except Exception as exc:
        run.status = "failed"
        run.error_count = 1
        run.message = str(exc)
        db.add(
            CrawlError(
                crawl_run_id=run.id,
                profile_url=employee.canonical_url,
                error_type=type(exc).__name__,
                message=str(exc),
            )
        )
    finally:
        run.finished_at = datetime.now(timezone.utc)
        db.commit()
        db.refresh(run)
    return run
 def _ensure_source(db: Session, source_url: str) -> ParserSource:
    source = db.scalar(select(ParserSource).where(ParserSource.source_url == source_url))
    if source:
@@ -91,10 +160,15 @@ def _ensure_source(db: Session, source_url: str) -> ParserSource:
    return source
-def _upsert_employee(db: Session, run: CrawlRun, parsed: dict) -> Employee:
+def _parsed_profile_key(parsed: dict) -> str:
    return f"{parsed.get('profile_type')}:{parsed.get('profile_id')}"
 def _upsert_employee(db: Session, run: CrawlRun, parsed: dict) -> tuple[Employee, bool]:
    html = parsed.pop("_html", None)
    parsed.pop("_resource_manifest", None)
    checksum = _checksum(parsed)
-    key = f"{parsed.get('profile_type')}:{parsed.get('profile_id')}"
+    key = _parsed_profile_key(parsed)
    employee = db.scalar(select(Employee).where(Employee.profile_key == key))
    now = datetime.now(timezone.utc)
    if not employee:
@@ -107,16 +181,33 @@ def _upsert_employee(db: Session, run: CrawlRun, parsed: dict) -> Employee:
        )
        db.add(employee)
        run.new_count += 1
        is_new = True
    else:
        is_new = False
    parser_version = parsed.get("parser_version")
    changed = is_new or employee.current_checksum != checksum or employee.parser_version != parser_version
    employee.full_name = parsed.get("full_name")
    employee.status = "active"
    employee.last_seen_at = now
    employee.dismissed_at = None
-    employee.parser_version = parsed.get("parser_version")
+    employee.parser_version = parser_version
    if changed:
        employee.current_data = parsed
    employee.current_checksum = checksum
    db.flush()
    if is_new:
        _record_employee_change(
            db,
            run,
            employee,
            "new",
            profile_available=True,
            message="Сотрудник впервые найден в источнике.",
        )
    if changed:
        db.query(ProfileTab).filter(ProfileTab.employee_id == employee.id).delete()
        for tab in parsed.get("tabs") or []:
            db.add(
@@ -135,26 +226,315 @@ def _upsert_employee(db: Session, run: CrawlRun, parsed: dict) -> Employee:
                parsed_data=parsed,
                html_snapshot=gzip.compress(html.encode("utf-8")) if html else None,
                checksum=checksum,
-            parser_version=parsed.get("parser_version"),
+                parser_version=parser_version,
            )
        )
-    return employee
+    db.flush()
    _try_sync_employee_publications(db, run, employee, parsed)
    _try_sync_employee_news_links(db, run, employee, parsed)
    return employee, changed
-def _mark_dismissed(db: Session, found_keys: set[str]) -> int:
+def _try_sync_employee_publications(db: Session, run: CrawlRun, employee: Employee, parsed: dict) -> None:
    try:
        if not _publication_payloads(parsed):
            return
        if not _employee_publications_table_exists(db):
            return
        with db.begin_nested():
            _sync_employee_publications(db, employee, parsed)
    except Exception as exc:
        db.add(
            CrawlError(
                crawl_run_id=run.id,
                profile_url=employee.canonical_url,
                error_type=type(exc).__name__,
                message=f"Не удалось сохранить публикации сотрудника: {exc}",
            )
        )
 def _employee_publications_table_exists(db: Session) -> bool:
    return inspect(db.connection()).has_table(EmployeePublication.__tablename__)
 def _sync_employee_publications(db: Session, employee: Employee, parsed: dict) -> None:
    publications = _publication_payloads(parsed)
    seen_hashes = set()
    for publication in publications:
        source_hash = _publication_hash(publication)
        seen_hashes.add(source_hash)
        publication_id = _clean_optional(publication.get("publication_id") or publication.get("id"))
        existing = None
        if publication_id:
            existing = db.scalar(
                select(EmployeePublication).where(
                    EmployeePublication.employee_id == employee.id,
                    EmployeePublication.publication_id == publication_id,
                )
            )
        if not existing:
            existing = db.scalar(
                select(EmployeePublication).where(
                    EmployeePublication.employee_id == employee.id,
                    EmployeePublication.source_hash == source_hash,
                )
            )
        if not existing:
            existing = EmployeePublication(employee_id=employee.id, source_hash=source_hash, title=_publication_title(publication))
            db.add(existing)
        _apply_publication(existing, publication, source_hash)
    if seen_hashes:
        stale = db.scalars(
            select(EmployeePublication).where(
                EmployeePublication.employee_id == employee.id,
                EmployeePublication.source_hash.not_in(seen_hashes),
            )
        ).all()
        for item in stale:
            db.delete(item)
 def _publication_payloads(parsed: dict) -> list[dict]:
    publications = []
    for section in parsed.get("sections") or []:
        if not isinstance(section, dict) or section.get("type") != "publications":
            continue
        for publication in section.get("publications") or []:
            if isinstance(publication, dict):
                publications.append(publication)
    return publications
 def _apply_publication(target: EmployeePublication, publication: dict, source_hash: str) -> None:
    target.publication_id = _clean_optional(publication.get("publication_id") or publication.get("id"))
    target.title = _publication_title(publication)
    target.year = _int_or_none(publication.get("year"))
    target.publication_type = _clean_optional(publication.get("publication_type") or publication.get("type"))
    target.language = _clean_optional(publication.get("language"))
    target.status = _int_or_none(publication.get("status"))
    target.url = _clean_optional(publication.get("url"))
    target.doi_url = _clean_optional(publication.get("doi_url"))
    target.other_url = _clean_optional(publication.get("other_url"))
    target.document_url = _clean_optional(publication.get("document_url"))
    target.citation_text = _clean_optional(publication.get("citation_text") or publication.get("text"))
    target.annotation = publication.get("annotation") if isinstance(publication.get("annotation"), dict) else None
    target.description = publication.get("description") if isinstance(publication.get("description"), dict) else None
    target.authors = publication.get("authors") if isinstance(publication.get("authors"), list) else None
    target.raw_data = publication.get("raw_data") if isinstance(publication.get("raw_data"), dict) else publication
    target.source_hash = source_hash
 def _publication_hash(publication: dict) -> str:
    return _payload_hash(publication.get("raw_data") if isinstance(publication.get("raw_data"), dict) else publication)
 def _payload_hash(value: object) -> str:
    payload = json.dumps(_stable_checksum_payload(value), ensure_ascii=False, sort_keys=True, separators=(",", ":"), default=str)
    return hashlib.sha256(payload.encode("utf-8")).hexdigest()
 def _publication_title(publication: dict) -> str:
    return _clean_optional(publication.get("title") or publication.get("text") or publication.get("id")) or "Untitled publication"
 def _clean_optional(value: object) -> str | None:
    text = str(value or "").strip()
    return text or None
 def _int_or_none(value: object) -> int | None:
    try:
        return int(value)
    except (TypeError, ValueError):
        return None
 def _try_sync_employee_news_links(db: Session, run: CrawlRun, employee: Employee, parsed: dict) -> None:
    try:
        if not _news_link_payloads(parsed):
            return
        if not _employee_news_links_table_exists(db):
            return
        with db.begin_nested():
            _sync_employee_news_links(db, employee, parsed)
    except Exception as exc:
        db.add(
            CrawlError(
                crawl_run_id=run.id,
                profile_url=employee.canonical_url,
                error_type=type(exc).__name__,
                message=f"Не удалось сохранить новости сотрудника: {exc}",
            )
        )
 def _employee_news_links_table_exists(db: Session) -> bool:
    return inspect(db.connection()).has_table(EmployeeNewsLink.__tablename__)
 def _sync_employee_news_links(db: Session, employee: Employee, parsed: dict) -> None:
    news_links = _news_link_payloads(parsed)
    seen_hashes = set()
    for news_link in news_links:
        source_hash = _news_link_hash(news_link)
        seen_hashes.add(source_hash)
        url = _clean_optional(news_link.get("url"))
        existing = None
        if url:
            existing = db.scalar(
                select(EmployeeNewsLink).where(
                    EmployeeNewsLink.employee_id == employee.id,
                    EmployeeNewsLink.url == url,
                )
            )
        if not existing:
            existing = db.scalar(
                select(EmployeeNewsLink).where(
                    EmployeeNewsLink.employee_id == employee.id,
                    EmployeeNewsLink.source_hash == source_hash,
                )
            )
        if not existing:
            existing = EmployeeNewsLink(employee_id=employee.id, source_hash=source_hash, title=_news_link_title(news_link))
            db.add(existing)
        _apply_news_link(existing, news_link, source_hash)
    if seen_hashes:
        stale = db.scalars(
            select(EmployeeNewsLink).where(
                EmployeeNewsLink.employee_id == employee.id,
                EmployeeNewsLink.source_hash.not_in(seen_hashes),
            )
        ).all()
        for item in stale:
            db.delete(item)
 def _news_link_payloads(parsed: dict) -> list[dict]:
    news_links = []
    for section in parsed.get("sections") or []:
        if not isinstance(section, dict) or section.get("type") != "news":
            continue
        for item in section.get("news_links") or []:
            if isinstance(item, dict):
                news_links.append(item)
    return news_links
 def _apply_news_link(target: EmployeeNewsLink, news_link: dict, source_hash: str) -> None:
    target.title = _news_link_title(news_link)
    target.url = _clean_optional(news_link.get("url"))
    target.summary = _clean_optional(news_link.get("summary"))
    target.published_at = _datetime_or_none(news_link.get("published_at"))
    target.published_year = _int_or_none(news_link.get("published_year"))
    target.raw_data = news_link.get("raw_data") if isinstance(news_link.get("raw_data"), dict) else news_link
    target.source_hash = source_hash
 def _news_link_hash(news_link: dict) -> str:
    return _payload_hash(news_link.get("raw_data") if isinstance(news_link.get("raw_data"), dict) else news_link)
 def _news_link_title(news_link: dict) -> str:
    return _clean_optional(news_link.get("title") or news_link.get("url")) or "Untitled news"
 def _datetime_or_none(value: object) -> datetime | None:
    if isinstance(value, datetime):
        return value
    if not value:
        return None
    try:
        parsed = datetime.fromisoformat(str(value).replace("Z", "+00:00"))
    except ValueError:
        return None
    return parsed if parsed.tzinfo else parsed.replace(tzinfo=timezone.utc)
 def _mark_dismissed(db: Session, run: CrawlRun, found_keys: set[str], session: requests.Session, timeout: int) -> int:
    dismissed = 0
    active = db.scalars(select(Employee).where(Employee.status == "active")).all()
    now = datetime.now(timezone.utc)
    for employee in active:
        if employee.profile_key in found_keys:
            continue
        profile_available = _profile_is_available(session, employee.canonical_url, timeout)
        if profile_available:
            _record_employee_change(
                db,
                run,
                employee,
                "missing_from_source",
                profile_available=True,
                message="Профиль доступен, но ссылка отсутствует в исходном списке.",
            )
            continue
        employee.status = "dismissed"
        employee.dismissed_at = now
        _record_employee_change(
            db,
            run,
            employee,
            "dismissed",
            profile_available=False,
            message="Сотрудник отсутствует в исходном списке, профиль не подтвердился как доступный.",
        )
        dismissed += 1
    db.commit()
    return dismissed
 def _profile_is_available(session: requests.Session, url: str, timeout: int) -> bool:
    try:
        response = session.get(url, headers=HEADERS, timeout=timeout, allow_redirects=True)
        return response.status_code < 400
    except requests.RequestException:
        return False
 def _record_employee_change(
    db: Session,
    run: CrawlRun,
    employee: Employee,
    change_type: str,
    *,
    profile_available: bool | None,
    message: str,
 ) -> None:
    db.add(
        CrawlRunEmployeeChange(
            crawl_run_id=run.id,
            employee_id=employee.id,
            profile_key=employee.profile_key,
            profile_url=employee.canonical_url,
            full_name=employee.full_name,
            change_type=change_type,
            profile_available=profile_available,
            message=message,
        )
    )
 def _checksum(data: dict) -> str:
-    payload = json.dumps(data, ensure_ascii=False, sort_keys=True, separators=(",", ":"))
+    payload = json.dumps(_stable_checksum_payload(data), ensure_ascii=False, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(payload.encode("utf-8")).hexdigest()
 def _stable_checksum_payload(value):
    if isinstance(value, dict):
        return {key: _stable_checksum_payload(item) for key, item in value.items()}
    if isinstance(value, list):
        return [_stable_checksum_payload(item) for item in value]
    if isinstance(value, str):
        return _normalize_date_dependent_experience(value)
    return value
 def _normalize_date_dependent_experience(value: str) -> str:
    return re.sub(
        r"(?i)(стаж(?:\s+работы)?(?:\s+в\s+ниу\s+вшэ|\s+в\s+вшэ)?\s*:?\s*)\d+\s*(?:год(?:а|ов)?|лет)",
        r"\1<experience-years>",
        value,
    )
--- a/app/services/dataset_versions.py
+++ b/app/services/dataset_versions.py
@@ -0,0 +1,227 @@
 import hashlib
 import json
 from dataclasses import dataclass
 from sqlalchemy import desc, select
 from sqlalchemy.orm import Session
 from app.models import DatasetVersion, DatasetVersionItem, Employee
@dataclass(frozen=True)
 class EmployeeMarker:
    profile_key: str
    employee_id: int | None
    status: str
    checksum: str
 def get_or_create_current_version(db: Session, *, crawl_run_id: int | None = None) -> DatasetVersion:
    employees = db.scalars(select(Employee).order_by(Employee.profile_key)).all()
    markers = [_employee_marker(employee) for employee in employees]
    dataset_hash = _dataset_hash(markers)
    latest = get_latest_version(db)
    if latest and latest.hash == dataset_hash:
        return latest
    active_count = sum(1 for marker in markers if marker.status == "active")
    dismissed_count = sum(1 for marker in markers if marker.status == "dismissed")
    version = DatasetVersion(
        hash=dataset_hash,
        previous_hash=latest.hash if latest else None,
        crawl_run_id=crawl_run_id,
        employee_count=len(markers),
        active_count=active_count,
        dismissed_count=dismissed_count,
    )
    db.add(version)
    db.flush()
    for marker in markers:
        db.add(
            DatasetVersionItem(
                dataset_version_id=version.id,
                profile_key=marker.profile_key,
                employee_id=marker.employee_id,
                status=marker.status,
                checksum=marker.checksum,
            )
        )
    db.flush()
    return version
 def get_latest_version(db: Session) -> DatasetVersion | None:
    return db.scalar(select(DatasetVersion).order_by(desc(DatasetVersion.created_at), desc(DatasetVersion.id)).limit(1))
 def get_version_by_hash(db: Session, dataset_hash: str | None) -> DatasetVersion | None:
    if not dataset_hash:
        return None
    return db.scalar(select(DatasetVersion).where(DatasetVersion.hash == dataset_hash).limit(1))
 def service_info_payload(db: Session, *, tools: list[dict], service_name: str, backend_version: str, protocol_version: str) -> dict:
    version = get_or_create_current_version(db)
    db.commit()
    return {
        "service_name": service_name,
        "backend_version": backend_version,
        "protocolVersion": protocol_version,
        "tools": tools,
        "dataset": _version_payload(version),
    }
 def sync_employees_payload(db: Session, *, client_hash: str | None = None, include_data: bool = True) -> dict:
    current = get_or_create_current_version(db)
    db.commit()
    if not client_hash:
        return _full_sync_payload(db, current, include_data=include_data, reason=None)
    if client_hash == current.hash:
        return {
            "mode": "delta",
            "from_hash": client_hash,
            "to_hash": current.hash,
            "dataset": _version_payload(current),
            "changes": {"added": [], "updated": [], "dismissed": [], "removed": []},
        }
    previous = get_version_by_hash(db, client_hash)
    if not previous:
        return _full_sync_payload(db, current, include_data=include_data, reason="unknown_client_hash", from_hash=client_hash)
    return _delta_sync_payload(db, previous, current, include_data=include_data)
 def _full_sync_payload(
    db: Session,
    current: DatasetVersion,
    *,
    include_data: bool,
    reason: str | None,
    from_hash: str | None = None,
 ) -> dict:
    employees = db.scalars(select(Employee).order_by(Employee.profile_key)).all()
    payload = {
        "mode": "full",
        "from_hash": from_hash,
        "to_hash": current.hash,
        "dataset": _version_payload(current),
        "items": [_employee_payload(employee, include_data=include_data) for employee in employees],
    }
    if reason:
        payload["reason"] = reason
    return payload
 def _delta_sync_payload(db: Session, previous: DatasetVersion, current: DatasetVersion, *, include_data: bool) -> dict:
    previous_items = _items_by_profile_key(previous)
    current_items = _items_by_profile_key(current)
    employees = {employee.profile_key: employee for employee in db.scalars(select(Employee)).all()}
    added = []
    updated = []
    dismissed = []
    removed = []
    for profile_key, current_item in sorted(current_items.items()):
        previous_item = previous_items.get(profile_key)
        employee = employees.get(profile_key)
        if not previous_item:
            if employee:
                added.append(_employee_payload(employee, include_data=include_data))
            continue
        if previous_item.checksum == current_item.checksum and previous_item.status == current_item.status:
            continue
        if current_item.status == "dismissed":
            dismissed.append(_tombstone(profile_key, current_item.status, employee))
        elif employee:
            updated.append(_employee_payload(employee, include_data=include_data))
    for profile_key, previous_item in sorted(previous_items.items()):
        if profile_key not in current_items:
            removed.append(_tombstone(profile_key, "removed", employees.get(profile_key), checksum=previous_item.checksum))
    return {
        "mode": "delta",
        "from_hash": previous.hash,
        "to_hash": current.hash,
        "dataset": _version_payload(current),
        "changes": {
            "added": added,
            "updated": updated,
            "dismissed": dismissed,
            "removed": removed,
        },
    }
 def _items_by_profile_key(version: DatasetVersion) -> dict[str, DatasetVersionItem]:
    return {item.profile_key: item for item in version.items}
 def _version_payload(version: DatasetVersion) -> dict:
    return {
        "hash": version.hash,
        "previous_hash": version.previous_hash,
        "created_at": version.created_at.isoformat() if version.created_at else None,
        "crawl_run_id": version.crawl_run_id,
        "employee_count": version.employee_count,
        "active_count": version.active_count,
        "dismissed_count": version.dismissed_count,
    }
 def _employee_marker(employee: Employee) -> EmployeeMarker:
    return EmployeeMarker(
        profile_key=employee.profile_key,
        employee_id=employee.id,
        status=employee.status,
        checksum=employee.current_checksum or _payload_hash(employee.current_data or {}),
    )
 def _dataset_hash(markers: list[EmployeeMarker]) -> str:
    payload = [
        {"profile_key": marker.profile_key, "status": marker.status, "checksum": marker.checksum}
        for marker in sorted(markers, key=lambda item: item.profile_key)
    ]
    return _payload_hash(payload)
 def _payload_hash(value: object) -> str:
    payload = json.dumps(value, ensure_ascii=False, sort_keys=True, separators=(",", ":"), default=str)
    return hashlib.sha256(payload.encode("utf-8")).hexdigest()
 def _employee_payload(employee: Employee, *, include_data: bool) -> dict:
    payload = {
        "profile_key": employee.profile_key,
        "profile_id": employee.profile_id,
        "full_name": employee.full_name,
        "status": employee.status,
        "canonical_url": employee.canonical_url,
        "last_seen_at": employee.last_seen_at.isoformat() if employee.last_seen_at else None,
        "dismissed_at": employee.dismissed_at.isoformat() if employee.dismissed_at else None,
        "checksum": employee.current_checksum or _payload_hash(employee.current_data or {}),
    }
    if include_data:
        payload["data"] = employee.current_data
    return payload
 def _tombstone(profile_key: str, status: str, employee: Employee | None, *, checksum: str | None = None) -> dict:
    payload = {
        "profile_key": profile_key,
        "status": status,
        "checksum": checksum or (employee.current_checksum if employee else None),
    }
    if employee:
        payload.update(
            {
                "profile_id": employee.profile_id,
                "full_name": employee.full_name,
                "canonical_url": employee.canonical_url,
                "dismissed_at": employee.dismissed_at.isoformat() if employee.dismissed_at else None,
            }
        )
    return payload
--- a/app/services/resource_cache.py
+++ b/app/services/resource_cache.py
@@ -0,0 +1,147 @@
 from __future__ import annotations
 import gzip
 import hashlib
 import json
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from typing import Any
 import requests
 from sqlalchemy import select
 from sqlalchemy.orm import Session
 from app.models import ParseResourceCache
 from app.version import BACKEND_VERSION
@dataclass(frozen=True)
 class CachedResource:
    text: str
    body_hash: str
    from_cache: bool
    status_code: int
 class ResourceCache:
    def __init__(self, db: Session):
        self.db = db
    def fetch_text(
        self,
        session: requests.Session,
        *,
        profile_key: str,
        resource_key: str,
        method: str,
        url: str,
        headers: dict[str, str],
        timeout: int,
        json_payload: Any | None = None,
        params: dict[str, Any] | None = None,
    ) -> CachedResource:
        method = method.upper()
        fingerprint = _request_fingerprint(method=method, url=url, json_payload=json_payload, params=params)
        cached = self.db.scalar(
            select(ParseResourceCache).where(
                ParseResourceCache.profile_key == profile_key,
                ParseResourceCache.resource_key == resource_key,
                ParseResourceCache.request_fingerprint == fingerprint,
            )
        )
        request_headers = dict(headers)
        if cached:
            if cached.etag:
                request_headers["If-None-Match"] = cached.etag
            if cached.last_modified:
                request_headers["If-Modified-Since"] = cached.last_modified
        response = _send(
            session,
            method=method,
            url=url,
            headers=request_headers,
            timeout=timeout,
            json_payload=json_payload,
            params=params,
        )
        if response.status_code == 304 and cached:
            cached.fetched_at = datetime.now(timezone.utc)
            self.db.flush()
            return CachedResource(
                text=gzip.decompress(cached.body_snapshot).decode("utf-8"),
                body_hash=cached.body_hash,
                from_cache=True,
                status_code=response.status_code,
            )
        response.raise_for_status()
        text = response.text
        body_hash = _body_hash(text)
        etag = response.headers.get("ETag") if hasattr(response, "headers") else None
        last_modified = response.headers.get("Last-Modified") if hasattr(response, "headers") else None
        if cached:
            cached.method = method
            cached.url = url
            cached.etag = etag
            cached.last_modified = last_modified
            cached.body_hash = body_hash
            cached.body_snapshot = gzip.compress(text.encode("utf-8"))
            cached.parser_version = BACKEND_VERSION
            cached.fetched_at = datetime.now(timezone.utc)
        else:
            self.db.add(
                ParseResourceCache(
                    profile_key=profile_key,
                    resource_key=resource_key,
                    method=method,
                    url=url,
                    request_fingerprint=fingerprint,
                    etag=etag,
                    last_modified=last_modified,
                    body_hash=body_hash,
                    body_snapshot=gzip.compress(text.encode("utf-8")),
                    parser_version=BACKEND_VERSION,
                    fetched_at=datetime.now(timezone.utc),
                )
            )
        self.db.flush()
        return CachedResource(text=text, body_hash=body_hash, from_cache=False, status_code=response.status_code)
 def _send(
    session: requests.Session,
    *,
    method: str,
    url: str,
    headers: dict[str, str],
    timeout: int,
    json_payload: Any | None,
    params: dict[str, Any] | None,
 ) -> requests.Response:
    if method == "POST":
        return session.post(url, json=json_payload, headers=headers, timeout=timeout, params=params)
    return session.get(url, headers=headers, timeout=timeout, params=params)
 def _request_fingerprint(
    *,
    method: str,
    url: str,
    json_payload: Any | None,
    params: dict[str, Any] | None,
 ) -> str:
    payload = {
        "method": method,
        "url": url,
        "json": json_payload,
        "params": params,
    }
    encoded = json.dumps(payload, ensure_ascii=False, sort_keys=True, separators=(",", ":"))
    return hashlib.sha256(encoded.encode("utf-8")).hexdigest()
 def _body_hash(text: str) -> str:
    return hashlib.sha256(text.encode("utf-8")).hexdigest()
--- a/app/static/admin.css
+++ b/app/static/admin.css
@@ -1,6 +1,8 @@
 .admin {
  margin: 0;
  min-height: 100vh;
  display: flex;
  flex-direction: column;
  color: #1f2937;
  background: #f6f7f9;
  font-family: Arial, sans-serif;
@@ -21,6 +23,11 @@
  font-size: 20px;
 }
 .admin__brand-link {
  color: inherit;
  text-decoration: none;
 }
 .admin__nav {
  display: flex;
  align-items: center;
@@ -34,6 +41,7 @@
 }
 .admin__main {
  flex: 1;
  width: min(1180px, calc(100% - 32px));
  margin: 28px auto;
 }
@@ -52,18 +60,30 @@
 }
 .metric {
  display: block;
  padding: 18px;
  background: #ffffff;
  border: 1px solid #d9dee7;
  border-radius: 8px;
 }
 .metric--link {
  color: inherit;
  text-decoration: none;
 }
 .metric--link:hover {
  border-color: #0f766e;
 }
 .metric__label {
  display: block;
  color: #6b7280;
  font-size: 13px;
 }
 .metric__value {
  display: block;
  margin-top: 8px;
  font-size: 28px;
  font-weight: 700;
@@ -87,6 +107,14 @@
  border-collapse: collapse;
 }
 .table__row {
  cursor: pointer;
 }
 .table__row:hover {
  background: #f0fdfa;
 }
 .table__cell,
 .table__head {
  padding: 10px 8px;
@@ -143,6 +171,10 @@
  background: transparent;
 }
 .button--compact {
  padding: 8px 12px;
 }
 .code {
  overflow-x: auto;
  padding: 14px;
@@ -173,11 +205,34 @@
  gap: 10px;
 }
 .employee-card__actions {
  display: grid;
  justify-items: end;
  gap: 10px;
 }
 .employee-card__title {
  margin: 0;
  font-size: 24px;
 }
 .employee-card__notice {
  margin: 0;
  padding: 12px 14px;
  border-radius: 8px;
  font-weight: 700;
 }
 .employee-card__notice--success {
  color: #065f46;
  background: #d1fae5;
 }
 .employee-card__notice--error {
  color: #991b1b;
  background: #fee2e2;
 }
 .employee-card__section {
  padding: 20px;
  background: #ffffff;
@@ -270,6 +325,18 @@
  line-height: 1.55;
 }
 .employee-section__meta {
  display: flex;
  flex-wrap: wrap;
  gap: 8px 12px;
  color: #4b5563;
  font-size: 13px;
 }
 .employee-section__meta-item {
  line-height: 1.4;
 }
 .employee-section__table-wrap {
  overflow-x: auto;
 }
@@ -319,12 +386,22 @@
 }
 .stats-strip__item {
  display: block;
  padding: 14px 16px;
  background: #ffffff;
  border: 1px solid #d9dee7;
  border-radius: 8px;
 }
 .stats-strip__item--link {
  color: inherit;
  text-decoration: none;
 }
 .stats-strip__item--link:hover {
  border-color: #0f766e;
 }
 .stats-strip__label {
  display: block;
  color: #6b7280;
--- a/app/static/admin.js
+++ b/app/static/admin.js
@@ -59,10 +59,23 @@
        applyColumns(columns);
      });
    });
  }
  function setupClickableRows() {
    const openRow = (row) => {
      window.location.href = row.dataset.rowHref;
    };
    document.querySelectorAll("[data-row-href]").forEach((row) => {
      row.addEventListener("click", (event) => {
        if (event.target.closest("a, button, input, select, label")) return;
-        window.location.href = row.dataset.rowHref;
+        openRow(row);
      });
      row.addEventListener("keydown", (event) => {
        if (!["Enter", " "].includes(event.key)) return;
        if (event.target.closest("a, button, input, select, label")) return;
        event.preventDefault();
        openRow(row);
      });
    });
  }
@@ -76,12 +89,14 @@
      const status = document.querySelector("[data-progress-status]");
      const processed = document.querySelector("[data-progress-processed]");
      const found = document.querySelector("[data-progress-found]");
      const skipped = document.querySelector("[data-progress-skipped]");
      const errors = document.querySelector("[data-progress-errors]");
      const fill = document.querySelector("[data-progress-fill]");
      const percent = document.querySelector("[data-progress-percent]");
-      if (status) status.textContent = run.status;
+      if (status) status.textContent = run.status_display || run.status;
      if (processed) processed.textContent = run.processed_count;
      if (found) found.textContent = run.found_count;
      if (skipped) skipped.textContent = run.skipped_count;
      if (errors) errors.textContent = run.error_count;
      if (fill) fill.style.width = `${run.progress_percent}%`;
      if (percent) percent.textContent = run.progress_percent;
@@ -107,5 +122,6 @@
  }
  setupColumns();
  setupClickableRows();
  setupProgress();
 })();
--- a/app/templates/base.html
+++ b/app/templates/base.html
@@ -8,14 +8,13 @@
  </head>
  <body class="admin">
    <header class="admin__header">
-      <h1 class="admin__brand">MIEM Employees</h1>
+      <h1 class="admin__brand"><a class="admin__brand-link" href="/admin">MIEM Employees</a></h1>
      <nav class="admin__nav">
-        <a class="admin__link" href="/admin">Dashboard</a>
+        <a class="admin__link" href="/admin">Обзор</a>
-        <a class="admin__link" href="/admin/directory">Directory</a>
+        <a class="admin__link" href="/admin/directory">Сотрудники</a>
-        <a class="admin__link" href="/admin/employees">Employees</a>
+        <a class="admin__link" href="/admin/runs">Запуски</a>
        <a class="admin__link" href="/admin/runs">Runs</a>
        <form method="post" action="/admin/logout">
-          <button class="button button--ghost" type="submit">Logout</button>
+          <button class="button button--ghost" type="submit">Выйти</button>
        </form>
      </nav>
    </header>
--- a/app/templates/dashboard.html
+++ b/app/templates/dashboard.html
@@ -1,43 +1,44 @@
 {% extends "base.html" %}
-{% block title %}Dashboard · MIEM Employees{% endblock %}
+{% block title %}Обзор · MIEM Employees{% endblock %}
 {% block content %}
 <section class="admin__grid">
-  <div class="metric"><div class="metric__label">Total</div><div class="metric__value">{{ counts.total }}</div></div>
+  <a class="metric metric--link" href="/admin/directory"><span class="metric__label">Всего в базе</span><span class="metric__value">{{ counts.total }}</span></a>
-  <div class="metric"><div class="metric__label">Active</div><div class="metric__value">{{ counts.active }}</div></div>
+  <a class="metric metric--link" href="/admin/directory?status=active"><span class="metric__label">Работают</span><span class="metric__value">{{ counts.active }}</span></a>
-  <div class="metric"><div class="metric__label">New in last run</div><div class="metric__value">{{ counts.new_in_last_run }}</div></div>
+  <a class="metric metric--link" href="{% if latest_run %}/admin/runs/{{ latest_run.id }}#new-employees{% else %}/admin/runs{% endif %}"><span class="metric__label">Новые за запуск</span><span class="metric__value">{{ counts.new_in_last_run }}</span></a>
-  <div class="metric"><div class="metric__label">Dismissed</div><div class="metric__value">{{ counts.dismissed }}</div></div>
+  <a class="metric metric--link" href="/admin/directory?status=dismissed"><span class="metric__label">Уволены</span><span class="metric__value">{{ counts.dismissed }}</span></a>
 </section>
 <section class="stats-strip">
  <div class="stats-strip__item">
-    <span class="stats-strip__label">Latest added</span>
+    <span class="stats-strip__label">Последний добавленный</span>
    {% if counts.latest_added %}
    <a class="stats-strip__value" href="/admin/employees/{{ counts.latest_added.id }}">{{ counts.latest_added.full_name or counts.latest_added.canonical_url }}</a>
    {% else %}
-    <span class="stats-strip__value">No employees yet</span>
+    <span class="stats-strip__value">Сотрудников пока нет</span>
    {% endif %}
  </div>
-  <div class="stats-strip__item">
+  <a class="stats-strip__item stats-strip__item--link" href="/admin/runs">
-    <span class="stats-strip__label">Runs</span>
+    <span class="stats-strip__label">Запуски</span>
    <span class="stats-strip__value">{{ counts.runs }}</span>
-  </div>
+  </a>
  <div class="stats-strip__item">
-    <span class="stats-strip__label">Errors</span>
+    <span class="stats-strip__label">Ошибки</span>
    <span class="stats-strip__value">{{ counts.errors }}</span>
  </div>
 </section>
 <section class="panel progress-panel" data-progress-panel>
  <div class="progress-panel__header">
-    <h2 class="panel__title">Parsing progress</h2>
+    <h2 class="panel__title">Прогресс парсинга</h2>
    <form method="post" action="/admin/crawl-now">
-      <button class="button" type="submit">Start crawl now</button>
+      <button class="button" type="submit">Запустить парсинг</button>
    </form>
  </div>
  {% set run = counts.current_running_run or latest_run %}
  <div class="progress-panel__body" data-progress-body>
    <div class="progress-panel__meta">
-      <span data-progress-status>{{ run.status if run else "idle" }}</span>
+      <span data-progress-status>{{ run.status_display if run else "Ожидание" }}</span>
-      <span><span data-progress-processed>{{ run.processed_count if run else 0 }}</span> / <span data-progress-found>{{ run.found_count if run else 0 }}</span> processed</span>
+      <span>обработано: <span data-progress-processed>{{ run.processed_count if run else 0 }}</span> / <span data-progress-found>{{ run.found_count if run else 0 }}</span></span>
-      <span><span data-progress-errors>{{ run.error_count if run else 0 }}</span> errors</span>
+      <span>без изменений: <span data-progress-skipped>{{ run.skipped_count if run else 0 }}</span></span>
      <span>ошибок: <span data-progress-errors>{{ run.error_count if run else 0 }}</span></span>
    </div>
    <div class="progress-bar" aria-label="Parsing progress">
      <div class="progress-bar__fill" data-progress-fill style="width: {{ run.progress_percent if run else 0 }}%"></div>
@@ -46,12 +47,12 @@
  </div>
 </section>
 <section class="panel">
-  <h2 class="panel__title">Latest runs</h2>
+  <h2 class="panel__title">Последние запуски</h2>
  <table class="table">
-    <thead><tr><th class="table__head">ID</th><th class="table__head">Status</th><th class="table__head">Parsed</th><th class="table__head">Errors</th><th class="table__head">Started</th></tr></thead>
+    <thead><tr><th class="table__head">ID</th><th class="table__head">Статус</th><th class="table__head">Обработано</th><th class="table__head">Без изменений</th><th class="table__head">Ошибки</th><th class="table__head">Старт</th></tr></thead>
    <tbody>
      {% for run in runs %}
-      <tr><td class="table__cell">{{ run.id }}</td><td class="table__cell">{{ run.status }}</td><td class="table__cell">{{ run.parsed_count }}</td><td class="table__cell">{{ run.error_count }}</td><td class="table__cell">{{ run.started_at }}</td></tr>
+      <tr class="table__row" onclick="window.location.href='/admin/runs/{{ run.id }}'" onkeydown="if (event.key === 'Enter' || event.key === ' ') { event.preventDefault(); window.location.href='/admin/runs/{{ run.id }}'; }" role="link" tabindex="0"><td class="table__cell">{{ run.id }}</td><td class="table__cell">{{ run.status_display }}</td><td class="table__cell">{{ run.parsed_count }}</td><td class="table__cell">{{ run.skipped_count }}</td><td class="table__cell">{{ run.error_count }}</td><td class="table__cell">{{ run.started_display }}</td></tr>
      {% endfor %}
    </tbody>
  </table>
--- a/app/templates/directory.html
+++ b/app/templates/directory.html
@@ -1,65 +1,72 @@
 {% extends "base.html" %}
-{% block title %}Directory · MIEM Employees{% endblock %}
+{% block title %}Сотрудники · MIEM Employees{% endblock %}
 {% block content %}
 <section class="directory">
  <div class="directory__header">
    <div>
-      <h2 class="directory__title">Directory</h2>
+      <h2 class="directory__title">Сотрудники</h2>
-      <p class="directory__summary">{{ page.total }} employees found</p>
+      <p class="directory__summary">Найдено: {{ page.total }}</p>
    </div>
-    <button class="button" type="button" data-columns-open>Columns</button>
+    <button class="button" type="button" data-columns-open>Колонки</button>
  </div>
  <form class="directory__filters" method="get" action="/admin/directory">
-    <input class="directory__input" name="q" value="{{ filters.q }}" placeholder="Name or URL">
+    <input class="directory__input" name="q" value="{{ filters.q }}" placeholder="ФИО или ссылка">
    <select class="directory__input" name="status">
-      <option value="" {% if not filters.status %}selected{% endif %}>All statuses</option>
+      <option value="" {% if not filters.status %}selected{% endif %}>Все статусы</option>
-      <option value="active" {% if filters.status == "active" %}selected{% endif %}>Active</option>
+      <option value="active" {% if filters.status == "active" %}selected{% endif %}>Работает</option>
-      <option value="dismissed" {% if filters.status == "dismissed" %}selected{% endif %}>Dismissed</option>
+      <option value="dismissed" {% if filters.status == "dismissed" %}selected{% endif %}>Уволен</option>
    </select>
    <select class="directory__input" name="has_email">
-      <option value="" {% if not filters.has_email %}selected{% endif %}>Any email</option>
+      <option value="" {% if not filters.has_email %}selected{% endif %}>Любой email</option>
-      <option value="true" {% if filters.has_email == "true" %}selected{% endif %}>Has email</option>
+      <option value="true" {% if filters.has_email == "true" %}selected{% endif %}>Есть email</option>
-      <option value="false" {% if filters.has_email == "false" %}selected{% endif %}>No email</option>
+      <option value="false" {% if filters.has_email == "false" %}selected{% endif %}>Нет email</option>
    </select>
-    <input class="directory__input" type="date" name="started_from" value="{{ filters.started_from }}" aria-label="First seen from">
+    <input class="directory__input" type="date" name="started_from" value="{{ filters.started_from }}" aria-label="Впервые найден с">
-    <input class="directory__input" type="date" name="started_to" value="{{ filters.started_to }}" aria-label="First seen to">
+    <input class="directory__input" type="date" name="started_to" value="{{ filters.started_to }}" aria-label="Впервые найден по">
    <select class="directory__input" name="sort">
-      {% for value, label in [("full_name", "Name"), ("status", "Status"), ("hse_start_year", "HSE start"), ("first_seen_at", "First seen"), ("last_seen_at", "Last seen"), ("dismissed_at", "Dismissed")] %}
+      {% for value, label in [("full_name", "ФИО"), ("status", "Статус"), ("hse_start_year", "Год начала"), ("first_seen_at", "Впервые найден"), ("last_seen_at", "Последний раз найден"), ("dismissed_at", "Дата увольнения")] %}
-      <option value="{{ value }}" {% if filters.sort == value %}selected{% endif %}>Sort: {{ label }}</option>
+      <option value="{{ value }}" {% if filters.sort == value %}selected{% endif %}>Сортировка: {{ label }}</option>
      {% endfor %}
    </select>
    <select class="directory__input" name="direction">
-      <option value="asc" {% if filters.direction == "asc" %}selected{% endif %}>Ascending</option>
+      <option value="asc" {% if filters.direction == "asc" %}selected{% endif %}>По возрастанию</option>
-      <option value="desc" {% if filters.direction == "desc" %}selected{% endif %}>Descending</option>
+      <option value="desc" {% if filters.direction == "desc" %}selected{% endif %}>По убыванию</option>
    </select>
-    <button class="button" type="submit">Apply</button>
+    <select class="directory__input" name="limit" onchange="this.form.offset.value = 0; this.form.submit()">
      {% for value in [25, 50, 100] %}
      <option value="{{ value }}" {% if filters.limit == value %}selected{% endif %}>На странице: {{ value }}</option>
      {% endfor %}
    </select>
    <input type="hidden" name="offset" value="{{ filters.offset }}">
    <button class="button" type="submit">Применить</button>
  </form>
  <div class="directory__table-wrap">
    <table class="directory-table" data-directory-table>
      <thead>
        <tr>
-          <th class="directory-table__head" data-column="full_name">Name</th>
+          <th class="directory-table__head" data-column="full_name">ФИО</th>
-          <th class="directory-table__head" data-column="status">Status</th>
+          <th class="directory-table__head" data-column="status">Статус</th>
-          <th class="directory-table__head" data-column="positions">Positions</th>
+          <th class="directory-table__head" data-column="positions">Должности</th>
-          <th class="directory-table__head" data-column="hse_start_year">HSE start</th>
+          <th class="directory-table__head" data-column="hse_start_year">Год начала</th>
          <th class="directory-table__head" data-column="email">Email</th>
-          <th class="directory-table__head" data-column="phone">Phone</th>
+          <th class="directory-table__head" data-column="phone">Телефон</th>
-          <th class="directory-table__head" data-column="address">Address</th>
+          <th class="directory-table__head" data-column="address">Адрес</th>
-          <th class="directory-table__head" data-column="publications_count">Publications</th>
+          <th class="directory-table__head" data-column="publications_count">Публикации</th>
-          <th class="directory-table__head" data-column="courses_count">Courses</th>
+          <th class="directory-table__head" data-column="courses_count">Курсы</th>
-          <th class="directory-table__head" data-column="first_seen_at">First seen</th>
+          <th class="directory-table__head" data-column="news_count">Новости</th>
-          <th class="directory-table__head" data-column="last_seen_at">Last seen</th>
+          <th class="directory-table__head" data-column="first_seen_at">Впервые найден</th>
-          <th class="directory-table__head" data-column="dismissed_at">Dismissed</th>
+          <th class="directory-table__head" data-column="last_seen_at">Последний раз найден</th>
-          <th class="directory-table__head" data-column="profile">Profile</th>
+          <th class="directory-table__head" data-column="dismissed_at">Дата увольнения</th>
          <th class="directory-table__head" data-column="profile">Профиль</th>
        </tr>
      </thead>
      <tbody>
-        {% for employee in page.items %}
+        {% for employee in page.employees %}
        <tr class="directory-table__row" data-row-href="/admin/employees/{{ employee.id }}">
-          <td class="directory-table__cell" data-column="full_name">{{ employee.full_name or "No name" }}</td>
+          <td class="directory-table__cell" data-column="full_name">{{ employee.full_name or "Без имени" }}</td>
-          <td class="directory-table__cell" data-column="status"><span class="badge {% if employee.status == "dismissed" %}badge--dismissed{% endif %}">{{ employee.status }}</span></td>
+          <td class="directory-table__cell" data-column="status"><span class="badge {% if employee.status == "dismissed" %}badge--dismissed{% endif %}">{{ employee.status_display }}</span></td>
          <td class="directory-table__cell" data-column="positions">{{ employee.positions_text }}</td>
          <td class="directory-table__cell" data-column="hse_start_year">{{ employee.hse_start_year or "" }}</td>
          <td class="directory-table__cell" data-column="email">{{ employee.email_text }}</td>
@@ -67,13 +74,14 @@
          <td class="directory-table__cell" data-column="address">{{ employee.address or "" }}</td>
          <td class="directory-table__cell" data-column="publications_count">{{ employee.publications_count }}</td>
          <td class="directory-table__cell" data-column="courses_count">{{ employee.courses_count }}</td>
-          <td class="directory-table__cell" data-column="first_seen_at">{{ employee.first_seen_at or "" }}</td>
+          <td class="directory-table__cell" data-column="news_count">{{ employee.news_count }}</td>
-          <td class="directory-table__cell" data-column="last_seen_at">{{ employee.last_seen_at or "" }}</td>
+          <td class="directory-table__cell" data-column="first_seen_at">{{ employee.first_seen_display }}</td>
-          <td class="directory-table__cell" data-column="dismissed_at">{{ employee.dismissed_at or "" }}</td>
+          <td class="directory-table__cell" data-column="last_seen_at">{{ employee.last_seen_display }}</td>
-          <td class="directory-table__cell" data-column="profile"><a class="admin__link" href="{{ employee.canonical_url }}">Open</a></td>
+          <td class="directory-table__cell" data-column="dismissed_at">{{ employee.dismissed_display }}</td>
          <td class="directory-table__cell" data-column="profile"><a class="admin__link" href="{{ employee.canonical_url }}">Открыть</a></td>
        </tr>
        {% else %}
-        <tr><td class="directory-table__empty" colspan="13">No employees match these filters.</td></tr>
+        <tr><td class="directory-table__empty" colspan="14">По этим фильтрам сотрудники не найдены.</td></tr>
        {% endfor %}
      </tbody>
    </table>
@@ -83,24 +91,24 @@
    {% set prev_offset = filters.offset - filters.limit %}
    {% set next_offset = filters.offset + filters.limit %}
    {% if filters.offset > 0 %}
-    <a class="admin__link" href="{{ request.url.include_query_params(offset=prev_offset) }}">Previous</a>
+    <a class="admin__link" href="{{ request.url.include_query_params(offset=prev_offset) }}">Назад</a>
    {% endif %}
-    <span class="directory__page">Page {{ page.page }}{% if page.pages %} of {{ page.pages }}{% endif %}</span>
+    <span class="directory__page">Страница {{ page.page }}{% if page.pages %} из {{ page.pages }}{% endif %}</span>
    {% if next_offset < page.total %}
-    <a class="admin__link" href="{{ request.url.include_query_params(offset=next_offset) }}">Next</a>
+    <a class="admin__link" href="{{ request.url.include_query_params(offset=next_offset) }}">Вперед</a>
    {% endif %}
  </div>
 </section>
 <div class="columns-modal" data-columns-modal hidden>
  <div class="columns-modal__backdrop" data-columns-close></div>
-  <section class="columns-modal__panel" aria-label="Column settings">
+  <section class="columns-modal__panel" aria-label="Настройка колонок">
    <div class="columns-modal__header">
-      <h3 class="columns-modal__title">Visible columns</h3>
+      <h3 class="columns-modal__title">Отображаемые колонки</h3>
-      <button class="button button--ghost" type="button" data-columns-close>Close</button>
+      <button class="button button--ghost" type="button" data-columns-close>Закрыть</button>
    </div>
    <div class="columns-modal__grid">
-      {% for key, label in [("full_name", "Name"), ("status", "Status"), ("positions", "Positions"), ("hse_start_year", "HSE start"), ("email", "Email"), ("phone", "Phone"), ("address", "Address"), ("publications_count", "Publications"), ("courses_count", "Courses"), ("first_seen_at", "First seen"), ("last_seen_at", "Last seen"), ("dismissed_at", "Dismissed"), ("profile", "Profile")] %}
+      {% for key, label in [("full_name", "ФИО"), ("status", "Статус"), ("positions", "Должности"), ("hse_start_year", "Год начала"), ("email", "Email"), ("phone", "Телефон"), ("address", "Адрес"), ("publications_count", "Публикации"), ("courses_count", "Курсы"), ("news_count", "Новости"), ("first_seen_at", "Впервые найден"), ("last_seen_at", "Последний раз найден"), ("dismissed_at", "Дата увольнения"), ("profile", "Профиль")] %}
      <label class="columns-modal__option"><input class="columns-modal__checkbox" type="checkbox" value="{{ key }}" data-column-toggle> {{ label }}</label>
      {% endfor %}
    </div>
--- a/app/templates/employee_detail.html
+++ b/app/templates/employee_detail.html
@@ -5,10 +5,20 @@
  <div class="employee-card__header">
    <div class="employee-card__identity">
      <h2 class="employee-card__title">{{ employee_view.full_name or employee.profile_key }}</h2>
-      <span class="badge {% if employee_view.status == "dismissed" %}badge--dismissed{% endif %}">{{ employee_view.status }}</span>
+      <span class="badge {% if employee_view.status == "dismissed" %}badge--dismissed{% endif %}">{{ employee_view.status_display }}</span>
    </div>
    <div class="employee-card__actions">
      <form method="post" action="/admin/employees/{{ employee.id }}/refresh">
        <button class="button button--compact" type="submit">Обновить данные</button>
      </form>
      <a class="admin__link" href="{{ employee_view.canonical_url }}">{{ employee_view.canonical_url }}</a>
    </div>
  </div>
  {% if refresh_status == "success" %}
  <p class="employee-card__notice employee-card__notice--success">Данные сотрудника обновлены.</p>
  {% elif refresh_status == "error" %}
  <p class="employee-card__notice employee-card__notice--error">Не удалось обновить данные сотрудника.</p>
  {% endif %}
  <section class="employee-card__section">
    <h3 class="employee-section__title">Основная информация</h3>
@@ -28,12 +38,11 @@
        </dd>
      </div>
      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Год начала работы в ВШЭ</dt><dd class="employee-card__meta-value">{{ employee_view.hse_start_year or "Не указано" }}</dd></div>
-      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Profile type</dt><dd class="employee-card__meta-value">{{ employee_view.profile_type or "Не указано" }}</dd></div>
+      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Тип профиля</dt><dd class="employee-card__meta-value">{{ employee_view.profile_type or "Не указано" }}</dd></div>
-      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Profile ID</dt><dd class="employee-card__meta-value">{{ employee_view.profile_id or "Не указано" }}</dd></div>
+      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">ID профиля</dt><dd class="employee-card__meta-value">{{ employee_view.profile_id or "Не указано" }}</dd></div>
-      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">First seen</dt><dd class="employee-card__meta-value">{{ employee_view.first_seen_at or "Не указано" }}</dd></div>
+      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Впервые найден</dt><dd class="employee-card__meta-value">{{ employee_view.first_seen_display }}</dd></div>
-      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Last seen</dt><dd class="employee-card__meta-value">{{ employee_view.last_seen_at or "Не указано" }}</dd></div>
+      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Последний раз найден</dt><dd class="employee-card__meta-value">{{ employee_view.last_seen_display }}</dd></div>
-      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Dismissed at</dt><dd class="employee-card__meta-value">{{ employee_view.dismissed_at or "Не указано" }}</dd></div>
+      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Дата увольнения</dt><dd class="employee-card__meta-value">{{ employee_view.dismissed_display }}</dd></div>
      <div class="employee-card__meta-item"><dt class="employee-card__meta-label">Parser version</dt><dd class="employee-card__meta-value">{{ employee_view.parser_version or "Не указано" }}</dd></div>
    </dl>
  </section>
@@ -95,6 +104,25 @@
  </section>
  {% endif %}
  {% if employee_view.news_links %}
  <section class="employee-card__section">
    <h3 class="employee-section__title">В новостях</h3>
    <ul class="employee-card__list">
      {% for news in employee_view.news_links %}
      <li class="employee-card__list-item">
        {% if news.published_display %}<div class="employee-section__meta"><span class="employee-section__meta-item">{{ news.published_display }}</span></div>{% endif %}
        {% if news.url %}
        <a class="admin__link" href="{{ news.url }}">{{ news.title }}</a>
        {% else %}
        {{ news.title }}
        {% endif %}
        {% if news.summary %}<div class="employee-section__text">{{ news.summary }}</div>{% endif %}
      </li>
      {% endfor %}
    </ul>
  </section>
  {% endif %}
  <section class="employee-card__section">
    <h3 class="employee-section__title">Разделы профиля</h3>
    {% if employee_view.sections %}
@@ -139,6 +167,34 @@
          </li>
          {% endfor %}
        </ul>
        {% elif section.type == "graduation_theses" and section.theses %}
        {% if section.theses_count %}<p class="employee-section__note">Всего: {{ section.theses_count }}</p>{% endif %}
        <ul class="employee-card__list">
          {% for thesis in section.theses %}
          <li class="employee-card__list-item">
            {% if thesis.student %}<strong>{{ thesis.student }}</strong>{% endif %}
            {% if thesis.title %}
            <div class="employee-section__text">
              {% if thesis.project_url %}
              <a class="admin__link" href="{{ thesis.project_url }}">{{ thesis.title }}</a>
              {% else %}
              {{ thesis.title }}
              {% endif %}
            </div>
            {% endif %}
            <div class="employee-section__meta">
              {% if thesis.defense_year %}<span class="employee-section__meta-item">Год защиты: {{ thesis.defense_year }}</span>{% endif %}
              {% if thesis.level %}<span class="employee-section__meta-item">{{ thesis.level }}</span>{% endif %}
              {% if thesis.rating is not none %}<span class="employee-section__meta-item">Оценка: {{ thesis.rating }}</span>{% endif %}
              {% if thesis.program %}
              <span class="employee-section__meta-item">
                {% if thesis.program_url %}<a class="admin__link" href="{{ thesis.program_url }}">{{ thesis.program }}</a>{% else %}{{ thesis.program }}{% endif %}
              </span>
              {% endif %}
            </div>
          </li>
          {% endfor %}
        </ul>
        {% elif section.type == "table" and section.table %}
        <div class="employee-section__table-wrap">
          <table class="employee-section__table">
@@ -162,16 +218,16 @@
        <p class="employee-section__text">{{ paragraph }}</p>
        {% endfor %}
        {% endif %}
-        {% if section.items %}
+        {% if section.list_items %}
        <ul class="employee-card__list">
-          {% for item in section.items %}
+          {% for item in section.list_items %}
          <li class="employee-card__list-item">{{ item }}</li>
          {% endfor %}
        </ul>
        {% endif %}
        {% endif %}
-        {% if section.links and section.type not in ["courses_by_year"] %}
+        {% if section.links and section.type not in ["courses_by_year", "graduation_theses"] %}
        <div class="employee-section__links">
          {% for link in section.links %}
          <a class="employee-section__link" href="{{ link.url }}">{{ link.text }}</a>
@@ -188,12 +244,12 @@
 </section>
 <section class="panel">
-  <h2 class="panel__title">Snapshots</h2>
+  <h2 class="panel__title">Снапшоты</h2>
  <table class="table">
-    <thead><tr><th class="table__head">Captured</th><th class="table__head">Checksum</th><th class="table__head">Parser</th></tr></thead>
+    <thead><tr><th class="table__head">Дата</th><th class="table__head">Checksum</th><th class="table__head">Парсер</th></tr></thead>
    <tbody>
      {% for snapshot in snapshots %}
-      <tr><td class="table__cell">{{ snapshot.captured_at }}</td><td class="table__cell">{{ snapshot.checksum }}</td><td class="table__cell">{{ snapshot.parser_version }}</td></tr>
+      <tr><td class="table__cell">{{ snapshot.captured_display }}</td><td class="table__cell">{{ snapshot.checksum }}</td><td class="table__cell">{{ snapshot.parser_version }}</td></tr>
      {% endfor %}
    </tbody>
  </table>
--- a/app/templates/employees.html
+++ b/app/templates/employees.html
@@ -1,29 +0,0 @@
 {% extends "base.html" %}
 {% block title %}Employees · MIEM Employees{% endblock %}
 {% block content %}
 <section class="panel">
  <h2 class="panel__title">Employees</h2>
  <form class="form" method="get" action="/admin/employees">
    <input class="form__input" name="q" value="{{ q }}" placeholder="Name or URL">
    <select class="form__select" name="status">
      <option value="" {% if not status %}selected{% endif %}>All</option>
      <option value="active" {% if status == "active" %}selected{% endif %}>Active</option>
      <option value="dismissed" {% if status == "dismissed" %}selected{% endif %}>Dismissed</option>
    </select>
    <button class="button" type="submit">Search</button>
  </form>
  <table class="table">
    <thead><tr><th class="table__head">Name</th><th class="table__head">Status</th><th class="table__head">Last seen</th><th class="table__head">Profile</th></tr></thead>
    <tbody>
      {% for employee in employees %}
      <tr>
        <td class="table__cell"><a class="admin__link" href="/admin/employees/{{ employee.id }}">{{ employee.full_name or employee.profile_key }}</a></td>
        <td class="table__cell"><span class="badge {% if employee.status == "dismissed" %}badge--dismissed{% endif %}">{{ employee.status }}</span></td>
        <td class="table__cell">{{ employee.last_seen_at }}</td>
        <td class="table__cell"><a class="admin__link" href="{{ employee.canonical_url }}">{{ employee.canonical_url }}</a></td>
      </tr>
      {% endfor %}
    </tbody>
  </table>
 </section>
 {% endblock %}
--- a/app/templates/login.html
+++ b/app/templates/login.html
@@ -3,18 +3,18 @@
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
-    <title>Login · MIEM Employees</title>
+    <title>Вход · MIEM Employees</title>
    <link rel="stylesheet" href="/static/admin.css">
  </head>
  <body class="admin">
    <main class="admin__main">
      <section class="panel">
-        <h1 class="panel__title">Admin login</h1>
+        <h1 class="panel__title">Вход в админку</h1>
        {% if error %}<p>{{ error }}</p>{% endif %}
        <form class="form" method="post" action="/admin/login">
-          <label class="form__label">Login <input class="form__input" name="username" autocomplete="username"></label>
+          <label class="form__label">Логин <input class="form__input" name="username" autocomplete="username"></label>
-          <label class="form__label">Password <input class="form__input" name="password" type="password" autocomplete="current-password"></label>
+          <label class="form__label">Пароль <input class="form__input" name="password" type="password" autocomplete="current-password"></label>
-          <button class="button" type="submit">Sign in</button>
+          <button class="button" type="submit">Войти</button>
        </form>
      </section>
    </main>
--- a/app/templates/run_detail.html
+++ b/app/templates/run_detail.html
@@ -0,0 +1,65 @@
 {% extends "base.html" %}
 {% block title %}Запуск {{ run.id }} · MIEM Employees{% endblock %}
 {% block content %}
 <section class="panel">
  <div class="progress-panel__header">
    <div>
      <h2 class="panel__title">Запуск {{ run.id }}</h2>
      <p class="progress-panel__empty">{{ run.started_display }} · {{ run.status_display }}</p>
    </div>
    <a class="admin__link" href="/admin/runs">Все запуски</a>
  </div>
  <div class="stats-strip">
    <div class="stats-strip__item"><span class="stats-strip__label">Найдено</span><span class="stats-strip__value">{{ run.found_count }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Обработано</span><span class="stats-strip__value">{{ run.parsed_count }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Без изменений</span><span class="stats-strip__value">{{ run.skipped_count }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Новые</span><span class="stats-strip__value">{{ run.new_count }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Потеряшки</span><span class="stats-strip__value">{{ run.changes.missing_from_source | length }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Уволены</span><span class="stats-strip__value">{{ run.dismissed_count }}</span></div>
    <div class="stats-strip__item"><span class="stats-strip__label">Ошибки</span><span class="stats-strip__value">{{ run.error_count }}</span></div>
  </div>
  {% if not run.changes_detail_available %}
  <p class="progress-panel__empty">Детализация сотрудников для этого запуска недоступна. Она сохраняется только для новых запусков после обновления.</p>
  {% endif %}
 </section>
 {% for group, title in [("new", "Новые сотрудники"), ("missing_from_source", "Потеряшки"), ("dismissed", "Уволенные")] %}
 <section class="panel"{% if group == "new" %} id="new-employees"{% endif %}>
  <h2 class="panel__title">{{ title }}</h2>
  {% set items = run.changes[group] %}
  {% if items %}
  <table class="table">
    <thead><tr><th class="table__head">ФИО</th><th class="table__head">Профиль</th><th class="table__head">Проверка</th><th class="table__head">Комментарий</th></tr></thead>
    <tbody>
      {% for item in items %}
      <tr>
        <td class="table__cell">{% if item.employee_id %}<a class="admin__link" href="/admin/employees/{{ item.employee_id }}">{{ item.full_name or item.profile_key }}</a>{% else %}{{ item.full_name or item.profile_key }}{% endif %}</td>
        <td class="table__cell"><a class="admin__link" href="{{ item.profile_url }}">{{ item.profile_url }}</a></td>
        <td class="table__cell">{{ item.profile_available_display }}</td>
        <td class="table__cell">{{ item.message or "" }}</td>
      </tr>
      {% endfor %}
    </tbody>
  </table>
  {% else %}
  <p class="progress-panel__empty">Нет записей.</p>
  {% endif %}
 </section>
 {% endfor %}
 <section class="panel">
  <h2 class="panel__title">Ошибки запуска</h2>
  {% if run.errors %}
  <table class="table">
    <thead><tr><th class="table__head">Профиль</th><th class="table__head">Ошибка</th><th class="table__head">Время</th></tr></thead>
    <tbody>
      {% for error in run.errors %}
      <tr><td class="table__cell">{{ error.profile_url or "" }}</td><td class="table__cell">{{ error.error_type }}: {{ error.message }}</td><td class="table__cell">{{ error.created_display }}</td></tr>
      {% endfor %}
    </tbody>
  </table>
  {% else %}
  <p class="progress-panel__empty">Ошибок нет.</p>
  {% endif %}
 </section>
 {% endblock %}
--- a/app/templates/runs.html
+++ b/app/templates/runs.html
@@ -1,20 +1,21 @@
 {% extends "base.html" %}
-{% block title %}Runs · MIEM Employees{% endblock %}
+{% block title %}Запуски · MIEM Employees{% endblock %}
 {% block content %}
 <section class="panel">
  <div class="progress-panel__header">
-    <h2 class="panel__title">Crawl runs</h2>
+    <h2 class="panel__title">Запуски парсинга</h2>
-    <form method="post" action="/admin/runs"><button class="button" type="submit">Start crawl now</button></form>
+    <form method="post" action="/admin/runs"><button class="button" type="submit">Запустить парсинг</button></form>
  </div>
  {% set run = runs[0] if runs else none %}
  {% if run %}
-  {% set processed = run.parsed_count + run.error_count %}
+  {% set processed = run.parsed_count + run.skipped_count + run.error_count %}
  {% set percent = ((processed / run.found_count) * 100) | round(1) if run.found_count else 0 %}
  <div class="progress-panel" data-progress-panel>
    <div class="progress-panel__meta">
-      <span data-progress-status>{{ run.status }}</span>
+      <span data-progress-status>{{ run.status_display }}</span>
-      <span><span data-progress-processed>{{ processed }}</span> / <span data-progress-found>{{ run.found_count }}</span> processed</span>
+      <span>обработано: <span data-progress-processed>{{ processed }}</span> / <span data-progress-found>{{ run.found_count }}</span></span>
-      <span><span data-progress-errors>{{ run.error_count }}</span> errors</span>
+      <span>без изменений: <span data-progress-skipped>{{ run.skipped_count }}</span></span>
      <span>ошибок: <span data-progress-errors>{{ run.error_count }}</span></span>
    </div>
    <div class="progress-bar" aria-label="Parsing progress">
      <div class="progress-bar__fill" data-progress-fill style="width: {{ percent }}%"></div>
@@ -24,9 +25,10 @@
  {% else %}
  <div class="progress-panel" data-progress-panel>
    <div class="progress-panel__meta">
-      <span data-progress-status>idle</span>
+      <span data-progress-status>Ожидание</span>
-      <span><span data-progress-processed>0</span> / <span data-progress-found>0</span> processed</span>
+      <span>обработано: <span data-progress-processed>0</span> / <span data-progress-found>0</span></span>
-      <span><span data-progress-errors>0</span> errors</span>
+      <span>без изменений: <span data-progress-skipped>0</span></span>
      <span>ошибок: <span data-progress-errors>0</span></span>
    </div>
    <div class="progress-bar" aria-label="Parsing progress">
      <div class="progress-bar__fill" data-progress-fill style="width: 0%"></div>
@@ -35,18 +37,18 @@
  </div>
  {% endif %}
  <table class="table">
-    <thead><tr><th class="table__head">ID</th><th class="table__head">Status</th><th class="table__head">Found</th><th class="table__head">Parsed</th><th class="table__head">New</th><th class="table__head">Errors</th><th class="table__head">Dismissed</th></tr></thead>
+    <thead><tr><th class="table__head">ID</th><th class="table__head">Статус</th><th class="table__head">Найдено</th><th class="table__head">Обработано</th><th class="table__head">Без изменений</th><th class="table__head">Новые</th><th class="table__head">Ошибки</th><th class="table__head">Уволены</th><th class="table__head">Старт</th></tr></thead>
    <tbody>
      {% for run in runs %}
-      <tr><td class="table__cell">{{ run.id }}</td><td class="table__cell">{{ run.status }}</td><td class="table__cell">{{ run.found_count }}</td><td class="table__cell">{{ run.parsed_count }}</td><td class="table__cell">{{ run.new_count }}</td><td class="table__cell">{{ run.error_count }}</td><td class="table__cell">{{ run.dismissed_count }}</td></tr>
+      <tr class="table__row" onclick="window.location.href='/admin/runs/{{ run.id }}'" onkeydown="if (event.key === 'Enter' || event.key === ' ') { event.preventDefault(); window.location.href='/admin/runs/{{ run.id }}'; }" role="link" tabindex="0"><td class="table__cell">{{ run.id }}</td><td class="table__cell">{{ run.status_display }}</td><td class="table__cell">{{ run.found_count }}</td><td class="table__cell">{{ run.parsed_count }}</td><td class="table__cell">{{ run.skipped_count }}</td><td class="table__cell">{{ run.new_count }}</td><td class="table__cell">{{ run.error_count }}</td><td class="table__cell">{{ run.dismissed_count }}</td><td class="table__cell">{{ run.started_display }}</td></tr>
      {% endfor %}
    </tbody>
  </table>
 </section>
 <section class="panel">
-  <h2 class="panel__title">Recent errors</h2>
+  <h2 class="panel__title">Последние ошибки</h2>
  <table class="table">
-    <thead><tr><th class="table__head">Run</th><th class="table__head">Profile</th><th class="table__head">Error</th></tr></thead>
+    <thead><tr><th class="table__head">Запуск</th><th class="table__head">Профиль</th><th class="table__head">Ошибка</th></tr></thead>
    <tbody>
      {% for error in errors %}
      <tr><td class="table__cell">{{ error.crawl_run_id }}</td><td class="table__cell">{{ error.profile_url }}</td><td class="table__cell">{{ error.error_type }}: {{ error.message }}</td></tr>
--- a/app/version.py
+++ b/app/version.py
@@ -1,3 +1,3 @@
-APP_VERSION = "0.2.4"
+APP_VERSION = "0.7.0"
-FRONTEND_VERSION = "0.2.4"
+FRONTEND_VERSION = "0.7.0"
-BACKEND_VERSION = "0.2.4"
+BACKEND_VERSION = "0.7.0"
--- a/app/worker.py
+++ b/app/worker.py
@@ -17,7 +17,14 @@ def crawl_once() -> None:
    settings = get_settings()
    with SessionLocal() as db:
        run = run_crawl(db, settings)
-        logger.info("crawl finished: id=%s status=%s parsed=%s errors=%s", run.id, run.status, run.parsed_count, run.error_count)
+        logger.info(
            "crawl finished: id=%s status=%s parsed=%s skipped=%s errors=%s",
            run.id,
            run.status,
            run.parsed_count,
            run.skipped_count,
            run.error_count,
        )
 def main() -> None:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -20,7 +20,7 @@ services:
    environment:
      DATABASE_URL: postgresql+psycopg://${POSTGRES_USER:-miem}:${POSTGRES_PASSWORD:-miem_password}@postgres:5432/${POSTGRES_DB:-miem_workers}
    ports:
-      - "127.0.0.1:8000:8000"
+      - "127.0.0.1:${API_PORT:-8000}:8000"
    depends_on:
      postgres:
        condition: service_healthy
@@ -42,7 +42,7 @@ services:
    environment:
      DATABASE_URL: postgresql+psycopg://${POSTGRES_USER:-miem}:${POSTGRES_PASSWORD:-miem_password}@postgres:5432/${POSTGRES_DB:-miem_workers}
    ports:
-      - "127.0.0.1:8001:8000"
+      - "127.0.0.1:${MCP_PORT:-8001}:8000"
    depends_on:
      postgres:
        condition: service_healthy
--- a/migrations/001_init.sql
+++ b/migrations/001_init.sql
@@ -13,6 +13,7 @@ CREATE TABLE IF NOT EXISTS crawl_runs (
  finished_at TIMESTAMPTZ,
  found_count INTEGER NOT NULL DEFAULT 0,
  parsed_count INTEGER NOT NULL DEFAULT 0,
  skipped_count INTEGER NOT NULL DEFAULT 0,
  new_count INTEGER NOT NULL DEFAULT 0,
  error_count INTEGER NOT NULL DEFAULT 0,
  dismissed_count INTEGER NOT NULL DEFAULT 0,
@@ -73,3 +74,22 @@ CREATE TABLE IF NOT EXISTS profile_tabs (
 );
 CREATE INDEX IF NOT EXISTS ix_profile_tabs_employee_id ON profile_tabs (employee_id);
 CREATE TABLE IF NOT EXISTS parse_resource_cache (
  id SERIAL PRIMARY KEY,
  profile_key VARCHAR(255) NOT NULL,
  resource_key VARCHAR(255) NOT NULL,
  method VARCHAR(16) NOT NULL,
  url TEXT NOT NULL,
  request_fingerprint VARCHAR(64) NOT NULL,
  etag TEXT,
  last_modified TEXT,
  body_hash VARCHAR(64) NOT NULL,
  body_snapshot BYTEA NOT NULL,
  parser_version VARCHAR(32),
  fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_parse_resource_cache_resource UNIQUE (profile_key, resource_key, request_fingerprint)
 );
 CREATE INDEX IF NOT EXISTS ix_parse_resource_cache_profile_key
  ON parse_resource_cache (profile_key);
--- a/migrations/003_crawl_run_employee_changes.sql
+++ b/migrations/003_crawl_run_employee_changes.sql
@@ -0,0 +1,21 @@
 CREATE TABLE IF NOT EXISTS crawl_run_employee_changes (
  id SERIAL PRIMARY KEY,
  crawl_run_id INTEGER NOT NULL REFERENCES crawl_runs(id),
  employee_id INTEGER REFERENCES employees(id),
  profile_key VARCHAR(255) NOT NULL,
  profile_url TEXT NOT NULL,
  full_name TEXT,
  change_type VARCHAR(32) NOT NULL,
  profile_available BOOLEAN,
  message TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
 );
 CREATE INDEX IF NOT EXISTS ix_crawl_run_employee_changes_run_id
  ON crawl_run_employee_changes (crawl_run_id);
 CREATE INDEX IF NOT EXISTS ix_crawl_run_employee_changes_employee_id
  ON crawl_run_employee_changes (employee_id);
 CREATE INDEX IF NOT EXISTS ix_crawl_run_employee_changes_change_type
  ON crawl_run_employee_changes (change_type);
--- a/migrations/004_dataset_versions.sql
+++ b/migrations/004_dataset_versions.sql
@@ -0,0 +1,29 @@
 CREATE TABLE IF NOT EXISTS dataset_versions (
  id SERIAL PRIMARY KEY,
  hash VARCHAR(64) NOT NULL UNIQUE,
  previous_hash VARCHAR(64),
  crawl_run_id INTEGER REFERENCES crawl_runs(id),
  employee_count INTEGER NOT NULL DEFAULT 0,
  active_count INTEGER NOT NULL DEFAULT 0,
  dismissed_count INTEGER NOT NULL DEFAULT 0,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
 );
 CREATE INDEX IF NOT EXISTS ix_dataset_versions_created_at
  ON dataset_versions (created_at);
 CREATE TABLE IF NOT EXISTS dataset_version_items (
  id SERIAL PRIMARY KEY,
  dataset_version_id INTEGER NOT NULL REFERENCES dataset_versions(id),
  profile_key VARCHAR(255) NOT NULL,
  employee_id INTEGER REFERENCES employees(id),
  status VARCHAR(32) NOT NULL,
  checksum VARCHAR(64) NOT NULL,
  CONSTRAINT uq_dataset_version_items_version_profile UNIQUE (dataset_version_id, profile_key)
 );
 CREATE INDEX IF NOT EXISTS ix_dataset_version_items_hash
  ON dataset_version_items (dataset_version_id);
 CREATE INDEX IF NOT EXISTS ix_dataset_version_items_profile_key
  ON dataset_version_items (profile_key);
--- a/migrations/005_parse_resource_cache.sql
+++ b/migrations/005_parse_resource_cache.sql
@@ -0,0 +1,21 @@
 ALTER TABLE crawl_runs
 ADD COLUMN IF NOT EXISTS skipped_count INTEGER NOT NULL DEFAULT 0;
 CREATE TABLE IF NOT EXISTS parse_resource_cache (
  id SERIAL PRIMARY KEY,
  profile_key VARCHAR(255) NOT NULL,
  resource_key VARCHAR(255) NOT NULL,
  method VARCHAR(16) NOT NULL,
  url TEXT NOT NULL,
  request_fingerprint VARCHAR(64) NOT NULL,
  etag TEXT,
  last_modified TEXT,
  body_hash VARCHAR(64) NOT NULL,
  body_snapshot BYTEA NOT NULL,
  parser_version VARCHAR(32),
  fetched_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_parse_resource_cache_resource UNIQUE (profile_key, resource_key, request_fingerprint)
 );
 CREATE INDEX IF NOT EXISTS ix_parse_resource_cache_profile_key
  ON parse_resource_cache (profile_key);
--- a/migrations/006_employee_publications.sql
+++ b/migrations/006_employee_publications.sql
@@ -0,0 +1,39 @@
 CREATE TABLE IF NOT EXISTS employee_publications (
  id SERIAL PRIMARY KEY,
  employee_id INTEGER NOT NULL REFERENCES employees(id) ON DELETE CASCADE,
  publication_id VARCHAR(64),
  title TEXT NOT NULL,
  year INTEGER,
  publication_type VARCHAR(64),
  language VARCHAR(16),
  status INTEGER,
  url TEXT,
  doi_url TEXT,
  other_url TEXT,
  document_url TEXT,
  citation_text TEXT,
  annotation JSONB,
  description JSONB,
  authors JSONB,
  raw_data JSONB,
  source_hash VARCHAR(64) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_employee_publications_employee_publication UNIQUE (employee_id, publication_id),
  CONSTRAINT uq_employee_publications_employee_source_hash UNIQUE (employee_id, source_hash)
 );
 CREATE INDEX IF NOT EXISTS ix_employee_publications_employee_id
  ON employee_publications (employee_id);
 CREATE INDEX IF NOT EXISTS ix_employee_publications_publication_id
  ON employee_publications (publication_id);
 CREATE INDEX IF NOT EXISTS ix_employee_publications_doi_url
  ON employee_publications (doi_url);
 CREATE INDEX IF NOT EXISTS ix_employee_publications_year
  ON employee_publications (year);
 CREATE INDEX IF NOT EXISTS ix_employee_publications_publication_type
  ON employee_publications (publication_type);
--- a/migrations/007_employee_news_links.sql
+++ b/migrations/007_employee_news_links.sql
@@ -0,0 +1,27 @@
 CREATE TABLE IF NOT EXISTS employee_news_links (
  id SERIAL PRIMARY KEY,
  employee_id INTEGER NOT NULL REFERENCES employees(id) ON DELETE CASCADE,
  title TEXT NOT NULL,
  url TEXT,
  summary TEXT,
  published_at TIMESTAMPTZ,
  published_year INTEGER,
  source_hash VARCHAR(64) NOT NULL,
  raw_data JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT uq_employee_news_links_employee_url UNIQUE (employee_id, url),
  CONSTRAINT uq_employee_news_links_employee_source_hash UNIQUE (employee_id, source_hash)
 );
 CREATE INDEX IF NOT EXISTS ix_employee_news_links_employee_id
  ON employee_news_links (employee_id);
 CREATE INDEX IF NOT EXISTS ix_employee_news_links_url
  ON employee_news_links (url);
 CREATE INDEX IF NOT EXISTS ix_employee_news_links_published_at
  ON employee_news_links (published_at);
 CREATE INDEX IF NOT EXISTS ix_employee_news_links_published_year
  ON employee_news_links (published_year);
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "miem-workers"
-version = "0.1.0"
+version = "0.7.0"
 description = "MIEM employees parser, admin API, and MCP server"
 requires-python = ">=3.11"
 dependencies = [
--- a/tests/test_admin_data.py
+++ b/tests/test_admin_data.py
@@ -1,15 +1,25 @@
 from datetime import datetime, timezone
-from app.models import CrawlRun, Employee
+from app.models import CrawlError, CrawlRun, CrawlRunEmployeeChange, Employee, EmployeeNewsLink
 from app.services.admin_data import (
    employee_detail_payload,
    employee_display_payload,
    format_admin_datetime,
    list_employees_page,
    run_detail_payload,
    run_payload,
    stats_payload,
 )
 def test_format_admin_datetime_handles_datetime_string_and_none():
    value = datetime(2026, 4, 28, 17, 13, 34, tzinfo=timezone.utc)
    assert format_admin_datetime(value) == "28.04.2026 20:13"
    assert format_admin_datetime("2026-04-28T17:13:34.448605+00:00") == "28.04.2026 20:13"
    assert format_admin_datetime(None) == "Не указано"
 def test_employee_display_payload_extracts_common_fields(db_session):
    employee = Employee(
        profile_key="staff:person",
@@ -25,6 +35,7 @@ def test_employee_display_payload_extracts_common_fields(db_session):
            "sections": [
                {"type": "publications", "publications": [{"title": "Paper"}]},
                {"type": "courses_by_year", "courses": [{"title": "Course"}]},
                {"type": "news", "news_links": [{"title": "News", "url": "https://example.test/news"}]},
            ],
        },
    )
@@ -32,9 +43,12 @@ def test_employee_display_payload_extracts_common_fields(db_session):
    payload = employee_display_payload(employee)
    assert payload["positions_text"] == "Professor"
    assert payload["status_display"] == "Работает"
    assert payload["email_text"] == "person@hse.ru"
    assert payload["publications_count"] == 1
    assert payload["courses_count"] == 1
    assert payload["news_count"] == 1
    assert payload["first_seen_display"] != "Не указано"
 def test_employee_detail_payload_normalizes_human_readable_sections(db_session):
@@ -74,11 +88,37 @@ def test_employee_detail_payload_normalizes_human_readable_sections(db_session):
                    "academic_year": "2025/2026",
                    "courses": [{"title": "Course", "url": "https://example.test/course"}],
                },
                {
                    "title": "ВКР",
                    "type": "graduation_theses",
                    "theses_count": 1,
                    "theses": [
                        {
                            "student": "Student Name",
                            "title": "Thesis title",
                            "defense_year": 2025,
                            "project_url": "https://www.hse.ru/edu/vkr/1",
                        }
                    ],
                },
                {
                    "title": "Fallback",
                    "type": "generic",
                    "raw_text": "Fallback text",
                },
                {
                    "title": "В новостях",
                    "type": "news",
                    "news_links": [
                        {
                            "title": "News title",
                            "url": "https://example.test/news",
                            "summary": "News summary",
                            "published_at": "2026-04-28T00:00:00+00:00",
                            "published_year": 2026,
                        }
                    ],
                },
            ],
        },
    )
@@ -91,7 +131,43 @@ def test_employee_detail_payload_normalizes_human_readable_sections(db_session):
    assert payload["sections"][0]["year_entries"][0]["text"] == "Master degree"
    assert payload["sections"][1]["publications"][0]["title"] == "Paper"
    assert payload["sections"][2]["courses"][0]["title"] == "Course"
-    assert payload["sections"][3]["paragraphs"] == ["Fallback text"]
+    assert payload["sections"][3]["theses"][0]["student"] == "Student Name"
    assert payload["sections"][4]["paragraphs"] == ["Fallback text"]
    assert payload["sections"][5]["news_links"][0]["title"] == "News title"
    assert payload["news_links"][0]["published_display"] == "28.04.2026"
 def test_employee_payload_prefers_stored_news_links(db_session):
    employee = Employee(
        profile_key="staff:news",
        canonical_url="https://www.hse.ru/staff/news",
        full_name="News Person",
        status="active",
        first_seen_at=datetime.now(timezone.utc),
        last_seen_at=datetime.now(timezone.utc),
        current_data={"sections": [{"type": "news", "news_links": [{"title": "Old news"}]}]},
    )
    db_session.add(employee)
    db_session.commit()
    db_session.add(
        EmployeeNewsLink(
            employee_id=employee.id,
            title="Stored news",
            url="https://example.test/stored",
            summary="Stored summary",
            published_at=datetime(2026, 4, 28, tzinfo=timezone.utc),
            published_year=2026,
            source_hash="b" * 64,
        )
    )
    db_session.commit()
    display = employee_display_payload(employee)
    detail = employee_detail_payload(employee)
    assert display["news_count"] == 1
    assert detail["news_links"][0]["title"] == "Stored news"
    assert detail["news_links"][0]["published_display"] == "28.04.2026"
 def test_employee_payloads_tolerate_malformed_current_data(db_session):
@@ -143,7 +219,8 @@ def test_list_employees_page_filters_sorts_and_paginates(db_session):
    page = list_employees_page(db_session, status="active", sort="full_name", direction="asc", limit=10)
    assert page["total"] == 1
-    assert page["items"][0]["full_name"] == "Alpha"
+    assert page["employees"][0]["full_name"] == "Alpha"
    assert page["limit"] == 50
 def test_stats_payload_uses_latest_run_new_count(db_session):
@@ -173,10 +250,52 @@ def test_run_payload_calculates_progress():
        status="running",
        found_count=10,
        parsed_count=4,
        skipped_count=2,
        error_count=1,
    )
    payload = run_payload(run)
-    assert payload["processed_count"] == 5
+    assert payload["processed_count"] == 7
-    assert payload["progress_percent"] == 50.0
+    assert payload["progress_percent"] == 70.0
    assert payload["status_display"] == "Выполняется"
 def test_run_detail_payload_groups_changes_and_handles_old_runs(db_session):
    old_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="completed")
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="completed", new_count=1)
    employee = Employee(
        profile_key="staff:new",
        canonical_url="https://www.hse.ru/staff/new",
        full_name="New Person",
        status="active",
        first_seen_at=datetime.now(timezone.utc),
        last_seen_at=datetime.now(timezone.utc),
    )
    db_session.add_all([old_run, run, employee])
    db_session.commit()
    db_session.add(
        CrawlRunEmployeeChange(
            crawl_run_id=run.id,
            employee_id=employee.id,
            profile_key=employee.profile_key,
            profile_url=employee.canonical_url,
            full_name=employee.full_name,
            change_type="new",
            profile_available=True,
            message="added",
        )
    )
    db_session.add(
        CrawlError(crawl_run_id=run.id, profile_url=employee.canonical_url, error_type="ValueError", message="bad")
    )
    db_session.commit()
    payload = run_detail_payload(db_session, run)
    old_payload = run_detail_payload(db_session, old_run)
    assert payload["changes_detail_available"] is True
    assert payload["changes"]["new"][0]["full_name"] == "New Person"
    assert payload["errors"][0]["error_type"] == "ValueError"
    assert old_payload["changes_detail_available"] is False
    assert old_payload["changes"]["new"] == []
--- a/tests/test_admin_templates.py
+++ b/tests/test_admin_templates.py
@@ -0,0 +1,95 @@
 from pathlib import Path
 def test_base_navigation_is_russian_and_has_no_legacy_employees_link():
    template = Path("app/templates/base.html").read_text(encoding="utf-8")
    assert "Обзор" in template
    assert "Сотрудники" in template
    assert "Запуски" in template
    assert "Выйти" in template
    assert '<a class="admin__brand-link" href="/admin">MIEM Employees</a>' in template
    assert ">Employees<" not in template
    assert "/admin/employees" not in template
 def test_directory_template_is_russian_and_uses_display_dates():
    template = Path("app/templates/directory.html").read_text(encoding="utf-8")
    assert "Сотрудники" in template
    assert "Колонки" in template
    assert "Применить" in template
    assert "На странице: {{ value }}" in template
    assert "{% for value in [25, 50, 100] %}" in template
    assert "Найдено:" in template
    assert "Новости" in template
    assert "employee.news_count" in template
    assert "employee.first_seen_display" in template
    assert "employee.last_seen_display" in template
    assert "employee.dismissed_display" in template
    assert "Directory" not in template
    assert "employees found" not in template
 def test_admin_employees_route_redirects_to_directory():
    source = Path("app/admin.py").read_text(encoding="utf-8")
    assert 'RedirectResponse("/admin/directory", status_code=303)' in source
 def test_dashboard_limits_latest_runs_to_five():
    source = Path("app/admin.py").read_text(encoding="utf-8")
    assert "order_by(desc(CrawlRun.started_at)).limit(5)" in source
    assert "order_by(desc(CrawlRun.started_at)).limit(10)" not in source
 def test_runs_template_links_to_run_detail():
    template = Path("app/templates/runs.html").read_text(encoding="utf-8")
    assert 'onclick="window.location.href=\'/admin/runs/{{ run.id }}\'"' in template
    assert "onkeydown=\"if (event.key === 'Enter' || event.key === ' ')" in template
    assert 'role="link"' in template
    assert 'tabindex="0"' in template
    assert 'data-row-href="/admin/runs/{{ run.id }}"' not in template
    assert '<a class="admin__link" href="/admin/runs/{{ run.id }}">' not in template
 def test_run_detail_template_extends_base_and_shows_change_groups():
    template = Path("app/templates/run_detail.html").read_text(encoding="utf-8")
    assert '{% extends "base.html" %}' in template
    assert 'id="new-employees"' in template
    assert "Новые сотрудники" in template
    assert "Потеряшки" in template
    assert "Уволенные" in template
    assert "Детализация сотрудников для этого запуска недоступна" in template
 def test_dashboard_metric_cards_link_to_admin_targets():
    template = Path("app/templates/dashboard.html").read_text(encoding="utf-8")
    assert 'href="/admin/directory"' in template
    assert 'href="/admin/directory?status=active"' in template
    assert '/admin/runs/{{ latest_run.id }}#new-employees' in template
    assert 'href="/admin/directory?status=dismissed"' in template
    assert 'href="/admin/runs"' in template
 def test_dashboard_latest_run_rows_link_to_run_detail():
    template = Path("app/templates/dashboard.html").read_text(encoding="utf-8")
    assert 'onclick="window.location.href=\'/admin/runs/{{ run.id }}\'"' in template
    assert "onkeydown=\"if (event.key === 'Enter' || event.key === ' ')" in template
    assert 'role="link"' in template
    assert 'tabindex="0"' in template
    assert 'data-row-href="/admin/runs/{{ run.id }}"' not in template
    assert '<a class="admin__link" href="/admin/runs/{{ run.id }}">' not in template
 def test_admin_js_supports_keyboard_activation_for_clickable_rows():
    source = Path("app/static/admin.js").read_text(encoding="utf-8")
    assert 'addEventListener("keydown"' in source
    assert '"Enter"' in source
    assert '" "' in source
--- a/tests/test_api_mcp.py
+++ b/tests/test_api_mcp.py
@@ -1,14 +1,16 @@
 import json
 from datetime import datetime, timezone
 from types import SimpleNamespace
 from fastapi.testclient import TestClient
-from sqlalchemy import create_engine
+from sqlalchemy import create_engine, select
 from sqlalchemy.orm import sessionmaker
 from sqlalchemy.pool import StaticPool
 from app.config import Settings, get_settings
 from app.db import Base, get_db
 from app.main import app
-from app.models import CrawlRun, Employee
+from app.models import CrawlRun, CrawlRunEmployeeChange, Employee, EmployeePublication
 from app.security import SESSION_COOKIE, sign_session
@@ -18,10 +20,10 @@ def test_health_returns_versions():
    response = client.get("/api/health")
    assert response.status_code == 200
-    assert response.json()["backend_version"] == "0.2.4"
+    assert response.json()["backend_version"] == "0.7.0"
-def test_mcp_requires_token_and_lists_tools():
+def test_mcp_lists_tools_without_auth_and_ignores_auth_header():
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
@@ -38,19 +40,23 @@ def test_mcp_requires_token_and_lists_tools():
            session.close()
    app.dependency_overrides[get_db] = override_db
    app.dependency_overrides[get_settings] = lambda: Settings(mcp_token="secret", session_secret="session-secret")
    client = TestClient(app)
-    unauthorized = client.post("/mcp", json={"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}})
+    without_auth = client.post("/mcp", json={"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}})
-    authorized = client.post(
+    with_auth = client.post(
        "/mcp",
-        headers={"Authorization": "Bearer secret"},
+        headers={"Authorization": "Bearer anything"},
        json={"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}},
    )
-    assert unauthorized.status_code == 401
+    assert without_auth.status_code == 200
-    assert authorized.status_code == 200
+    assert with_auth.status_code == 200
-    assert authorized.json()["result"]["tools"][0]["name"] == "search_employees"
+    tool_names = {tool["name"] for tool in without_auth.json()["result"]["tools"]}
    assert "search_employees" in tool_names
    assert "get_service_info" in tool_names
    assert "sync_employees" in tool_names
    assert any(tool["name"] == "get_crawl_run_details" for tool in without_auth.json()["result"]["tools"])
    assert with_auth.json()["result"]["tools"] == without_auth.json()["result"]["tools"]
    app.dependency_overrides.clear()
@@ -88,12 +94,10 @@ def test_mcp_search_employees_returns_matching_employee():
            db.close()
    app.dependency_overrides[get_db] = override_db
    app.dependency_overrides[get_settings] = lambda: Settings(mcp_token="secret", session_secret="session-secret")
    client = TestClient(app)
    response = client.post(
        "/mcp",
        headers={"Authorization": "Bearer secret"},
        json={
            "jsonrpc": "2.0",
            "id": 1,
@@ -108,6 +112,304 @@ def test_mcp_search_employees_returns_matching_employee():
    app.dependency_overrides.clear()
 def test_mcp_service_info_returns_tools_and_dataset_hash():
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()
    session.add(
        Employee(
            profile_key="staff:alpha",
            profile_type="staff",
            profile_id="alpha",
            canonical_url="https://www.hse.ru/staff/alpha",
            full_name="Alpha Person",
            status="active",
            current_checksum="a" * 64,
            current_data={"sections": []},
        )
    )
    session.commit()
    session.close()
    def override_db():
        db = Session()
        try:
            yield db
        finally:
            db.close()
    app.dependency_overrides[get_db] = override_db
    client = TestClient(app)
    response = client.post(
        "/mcp",
        json={"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "get_service_info", "arguments": {}}},
    )
    assert response.status_code == 200
    payload = json.loads(response.json()["result"]["content"][0]["text"])
    assert payload["service_name"] == "miem-employees"
    assert payload["backend_version"] == "0.7.0"
    assert payload["dataset"]["hash"]
    assert any(tool["name"] == "sync_employees" for tool in payload["tools"])
    app.dependency_overrides.clear()
 def test_mcp_list_employee_publications_prefers_stored_publications_with_fallback():
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()
    stored_employee = Employee(
        profile_key="staff:stored",
        profile_type="staff",
        profile_id="stored",
        canonical_url="https://www.hse.ru/staff/stored",
        full_name="Stored Person",
        status="active",
        current_data={
            "sections": [
                {
                    "type": "publications",
                    "publications": [{"title": "Old JSON Publication", "url": "https://example.test/old"}],
                }
            ]
        },
    )
    fallback_employee = Employee(
        profile_key="staff:fallback",
        profile_type="staff",
        profile_id="fallback",
        canonical_url="https://www.hse.ru/staff/fallback",
        full_name="Fallback Person",
        status="active",
        current_data={
            "sections": [
                {
                    "type": "publications",
                    "publications": [{"title": "Fallback Publication", "url": "https://example.test/fallback"}],
                }
            ]
        },
    )
    session.add_all([stored_employee, fallback_employee])
    session.commit()
    session.add(
        EmployeePublication(
            employee_id=stored_employee.id,
            publication_id="pub-1",
            title="Stored Publication",
            year=2024,
            publication_type="ARTICLE",
            url="https://publications.hse.ru/view/pub-1",
            doi_url="https://doi.org/10.1/test",
            citation_text="Stored Citation",
            annotation={"ru": "Аннотация", "en": "Abstract"},
            description={"main": "Stored Citation"},
            authors=[{"id": "1", "title_ru": "Автор", "is_current_employee": True}],
            source_hash="a" * 64,
        )
    )
    session.commit()
    session.close()
    def override_db():
        db = Session()
        try:
            yield db
        finally:
            db.close()
    app.dependency_overrides[get_db] = override_db
    client = TestClient(app)
    stored_response = client.post(
        "/mcp",
        json={
            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {"name": "list_employee_publications", "arguments": {"profile_id_or_url": "stored"}},
        },
    )
    fallback_response = client.post(
        "/mcp",
        json={
            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {"name": "list_employee_publications", "arguments": {"profile_id_or_url": "fallback"}},
        },
    )
    stored_payload = json.loads(stored_response.json()["result"]["content"][0]["text"])
    fallback_payload = json.loads(fallback_response.json()["result"]["content"][0]["text"])
    assert stored_payload["items"][0]["title"] == "Stored Publication"
    assert stored_payload["items"][0]["doi_url"] == "https://doi.org/10.1/test"
    assert stored_payload["items"][0]["annotation"] == {"ru": "Аннотация", "en": "Abstract"}
    assert stored_payload["items"][0]["authors"] == [{"id": "1", "title_ru": "Автор", "is_current_employee": True}]
    assert fallback_payload["items"][0]["title"] == "Fallback Publication"
    app.dependency_overrides.clear()
 def test_mcp_sync_employees_full_empty_and_unknown_hash_modes():
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()
    session.add(
        Employee(
            profile_key="staff:alpha",
            profile_type="staff",
            profile_id="alpha",
            canonical_url="https://www.hse.ru/staff/alpha",
            full_name="Alpha Person",
            status="active",
            current_checksum="a" * 64,
            current_data={"sections": [{"type": "paragraphs"}]},
        )
    )
    session.commit()
    session.close()
    def override_db():
        db = Session()
        try:
            yield db
        finally:
            db.close()
    app.dependency_overrides[get_db] = override_db
    client = TestClient(app)
    full_response = client.post(
        "/mcp",
        json={"jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": {"name": "sync_employees", "arguments": {}}},
    )
    full_payload = json.loads(full_response.json()["result"]["content"][0]["text"])
    current_hash = full_payload["to_hash"]
    empty_response = client.post(
        "/mcp",
        json={
            "jsonrpc": "2.0",
            "id": 2,
            "method": "tools/call",
            "params": {"name": "sync_employees", "arguments": {"client_hash": current_hash}},
        },
    )
    empty_payload = json.loads(empty_response.json()["result"]["content"][0]["text"])
    unknown_response = client.post(
        "/mcp",
        json={
            "jsonrpc": "2.0",
            "id": 3,
            "method": "tools/call",
            "params": {"name": "sync_employees", "arguments": {"client_hash": "missing"}},
        },
    )
    unknown_payload = json.loads(unknown_response.json()["result"]["content"][0]["text"])
    assert full_payload["mode"] == "full"
    assert full_payload["items"][0]["data"] == {"sections": [{"type": "paragraphs"}]}
    assert empty_payload["mode"] == "delta"
    assert empty_payload["changes"] == {"added": [], "updated": [], "dismissed": [], "removed": []}
    assert unknown_payload["mode"] == "full"
    assert unknown_payload["reason"] == "unknown_client_hash"
    app.dependency_overrides.clear()
 def test_mcp_get_crawl_run_details_returns_changes():
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    session = Session()
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="completed", new_count=1)
    employee = Employee(
        profile_key="staff:new",
        profile_type="staff",
        profile_id="new",
        canonical_url="https://www.hse.ru/staff/new",
        full_name="New Person",
        status="active",
        first_seen_at=datetime.now(timezone.utc),
        last_seen_at=datetime.now(timezone.utc),
    )
    session.add_all([run, employee])
    session.commit()
    session.add(
        CrawlRunEmployeeChange(
            crawl_run_id=run.id,
            employee_id=employee.id,
            profile_key=employee.profile_key,
            profile_url=employee.canonical_url,
            full_name=employee.full_name,
            change_type="new",
            profile_available=True,
            message="added",
        )
    )
    session.commit()
    run_id = run.id
    session.close()
    def override_db():
        db = Session()
        try:
            yield db
        finally:
            db.close()
    app.dependency_overrides[get_db] = override_db
    client = TestClient(app)
    response = client.post(
        "/mcp",
        json={
            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/call",
            "params": {"name": "get_crawl_run_details", "arguments": {"run_id": run_id}},
        },
    )
    assert response.status_code == 200
    text = response.json()["result"]["content"][0]["text"]
    assert "New Person" in text
    assert "changes_detail_available" in text
    app.dependency_overrides.clear()
 def test_mcp_protected_resource_metadata_route_is_removed():
    client = TestClient(app)
    response = client.get("/.well-known/oauth-protected-resource")
    assert response.status_code == 404
 def test_api_employees_and_stats_require_admin_session():
    engine = create_engine(
        "sqlite:///:memory:",
@@ -130,8 +432,23 @@ def test_api_employees_and_stats_require_admin_session():
            current_data={"contacts": {"emails": ["alpha@hse.ru"]}, "sections": []},
        )
    )
-    db.add(CrawlRun(source_url="https://miem.hse.ru/persons", status="completed", new_count=1))
+    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="completed", new_count=1)
    db.add(run)
    db.commit()
    db.add(
        CrawlRunEmployeeChange(
            crawl_run_id=run.id,
            employee_id=1,
            profile_key="staff:alpha",
            profile_url="https://www.hse.ru/staff/alpha",
            full_name="Alpha Person",
            change_type="new",
            profile_available=True,
            message="added",
        )
    )
    db.commit()
    run_id = run.id
    db.close()
    settings = Settings(admin_username="admin", admin_password="password", session_secret="session-secret")
@@ -150,10 +467,66 @@ def test_api_employees_and_stats_require_admin_session():
    employees = client.get("/api/employees", params={"q": "Alpha", "has_email": True})
    stats = client.get("/api/stats")
    run_details = client.get(f"/api/crawl-runs/{run_id}")
    assert employees.status_code == 200
    assert employees.json()["total"] == 1
    assert stats.status_code == 200
    assert stats.json()["new_in_last_run"] == 1
    assert run_details.status_code == 200
    assert run_details.json()["changes"]["new"][0]["full_name"] == "Alpha Person"
    app.dependency_overrides.clear()
 def test_admin_refresh_employee_route_updates_only_requested_employee(monkeypatch):
    engine = create_engine(
        "sqlite:///:memory:",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    Base.metadata.create_all(engine)
    Session = sessionmaker(bind=engine)
    db = Session()
    db.add(
        Employee(
            profile_key="org_person:133709486",
            profile_type="org_person",
            profile_id="133709486",
            canonical_url="https://www.hse.ru/org/persons/133709486",
            full_name="Будков Юрий Алексеевич",
            status="active",
        )
    )
    db.commit()
    employee_id = db.scalar(select(Employee.id))
    db.close()
    settings = Settings(admin_username="admin", admin_password="password", session_secret="session-secret")
    def override_db():
        session = Session()
        try:
            yield session
        finally:
            session.close()
    calls = []
    def fake_refresh_employee(db, refreshed_employee, route_settings):
        calls.append((refreshed_employee.id, route_settings))
        return SimpleNamespace(status="completed")
    app.dependency_overrides[get_db] = override_db
    app.dependency_overrides[get_settings] = lambda: settings
    monkeypatch.setattr("app.admin.refresh_employee", fake_refresh_employee)
    client = TestClient(app)
    client.cookies.set(SESSION_COOKIE, sign_session("admin", settings))
    response = client.post(f"/admin/employees/{employee_id}/refresh", follow_redirects=False)
    assert response.status_code == 303
    assert response.headers["location"] == f"/admin/employees/{employee_id}?refresh_status=success"
    assert calls == [(employee_id, settings)]
    app.dependency_overrides.clear()
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,13 @@
 from app.config import Settings
 def test_empty_crawl_limit_is_treated_as_none():
    settings = Settings(crawl_limit="")
    assert settings.crawl_limit is None
 def test_numeric_crawl_limit_is_parsed():
    settings = Settings(crawl_limit="25")
    assert settings.crawl_limit == 25
--- a/tests/test_crawler.py
+++ b/tests/test_crawler.py
@@ -1,10 +1,64 @@
 import gzip
 from datetime import datetime, timezone
-from app.models import CrawlRun, Employee
+from app.models import (
-from app.services.crawler import _mark_dismissed, _upsert_employee
+    CrawlError,
    CrawlRun,
    CrawlRunEmployeeChange,
    Employee,
    EmployeeNewsLink,
    EmployeePublication,
    EmployeeSnapshot,
    ParseResourceCache,
 )
 from app.services.crawler import _checksum, _mark_dismissed, _upsert_employee
 from app.services.resource_cache import ResourceCache
-def test_mark_dismissed_only_marks_missing_active_employees(db_session):
+class FakeResponse:
    def __init__(self, status_code):
        self.status_code = status_code
 class FakeSession:
    def __init__(self, statuses):
        self.statuses = statuses
    def get(self, url, **_kwargs):
        return FakeResponse(self.statuses[url])
 class ConditionalResponse:
    def __init__(self, status_code, text="", headers=None):
        self.status_code = status_code
        self._text = text
        self.headers = headers or {}
        self.text_read = False
    @property
    def text(self):
        self.text_read = True
        return self._text
    def raise_for_status(self):
        return None
 class ConditionalSession:
    def __init__(self):
        self.requests = []
        self.not_modified_response = ConditionalResponse(304)
    def get(self, url, **kwargs):
        self.requests.append((url, kwargs))
        if kwargs["headers"].get("If-None-Match") == '"cached"':
            return self.not_modified_response
        return ConditionalResponse(200, "fresh", {"ETag": '"fresh"'})
 def test_mark_dismissed_records_missing_source_when_profile_is_available(db_session):
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add(run)
    db_session.add(
        Employee(
            profile_key="staff:kept",
@@ -16,8 +70,8 @@ def test_mark_dismissed_only_marks_missing_active_employees(db_session):
    )
    db_session.add(
        Employee(
-            profile_key="staff:gone",
+            profile_key="staff:missing",
-            canonical_url="https://www.hse.ru/staff/gone",
+            canonical_url="https://www.hse.ru/staff/missing",
            status="active",
            first_seen_at=datetime.now(timezone.utc),
            last_seen_at=datetime.now(timezone.utc),
@@ -25,16 +79,53 @@ def test_mark_dismissed_only_marks_missing_active_employees(db_session):
    )
    db_session.commit()
-    dismissed = _mark_dismissed(db_session, {"staff:kept"})
+    dismissed = _mark_dismissed(
        db_session,
        run,
        {"staff:kept"},
        FakeSession({"https://www.hse.ru/staff/missing": 200}),
        30,
    )
    assert dismissed == 0
    assert db_session.query(Employee).filter_by(profile_key="staff:kept").one().status == "active"
    missing = db_session.query(Employee).filter_by(profile_key="staff:missing").one()
    assert missing.status == "active"
    assert missing.dismissed_at is None
    change = db_session.query(CrawlRunEmployeeChange).one()
    assert change.change_type == "missing_from_source"
    assert change.profile_available is True
 def test_mark_dismissed_marks_missing_employee_when_profile_is_unavailable(db_session):
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    employee = Employee(
        profile_key="staff:gone",
        canonical_url="https://www.hse.ru/staff/gone",
        status="active",
        first_seen_at=datetime.now(timezone.utc),
        last_seen_at=datetime.now(timezone.utc),
    )
    db_session.add_all([run, employee])
    db_session.commit()
    dismissed = _mark_dismissed(
        db_session,
        run,
        set(),
        FakeSession({"https://www.hse.ru/staff/gone": 404}),
        30,
    )
    assert dismissed == 1
-    assert db_session.query(Employee).filter_by(profile_key="staff:kept").one().status == "active"
+    assert employee.status == "dismissed"
-    gone = db_session.query(Employee).filter_by(profile_key="staff:gone").one()
+    assert employee.dismissed_at is not None
-    assert gone.status == "dismissed"
+    change = db_session.query(CrawlRunEmployeeChange).one()
-    assert gone.dismissed_at is not None
+    assert change.change_type == "dismissed"
    assert change.profile_available is False
-def test_upsert_employee_increments_new_count_for_new_employee(db_session):
+def test_upsert_employee_increments_new_count_and_records_change_for_new_employee(db_session):
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add(run)
    db_session.commit()
@@ -56,3 +147,237 @@ def test_upsert_employee_increments_new_count_for_new_employee(db_session):
    db_session.commit()
    assert run.new_count == 1
    change = db_session.query(CrawlRunEmployeeChange).one()
    assert change.change_type == "new"
    assert change.full_name == "New Person"
 def test_resource_cache_uses_etag_and_reuses_cached_body_on_304(db_session):
    db_session.add(
        ParseResourceCache(
            profile_key="staff:cached",
            resource_key="main-html",
            method="GET",
            url="https://www.hse.ru/staff/cached",
            request_fingerprint="020d59db7b358d9023d0f185bcbf5a9c085d3cf2bf91d92d48eee9147e8d0f01",
            etag='"cached"',
            body_hash="cached-hash",
            body_snapshot=gzip.compress("cached body".encode("utf-8")),
            parser_version="0.6.0",
        )
    )
    db_session.commit()
    session = ConditionalSession()
    result = ResourceCache(db_session).fetch_text(
        session,
        profile_key="staff:cached",
        resource_key="main-html",
        method="GET",
        url="https://www.hse.ru/staff/cached",
        headers={"User-Agent": "test"},
        timeout=10,
    )
    assert session.requests[0][1]["headers"]["If-None-Match"] == '"cached"'
    assert result.text == "cached body"
    assert result.from_cache is True
    assert session.not_modified_response.text_read is False
 def test_upsert_employee_skips_snapshot_when_checksum_is_unchanged(db_session):
    first_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    second_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add_all([first_run, second_run])
    db_session.commit()
    _, first_changed = _upsert_employee(db_session, first_run, _parsed_employee("same"))
    _, second_changed = _upsert_employee(db_session, second_run, _parsed_employee("same"))
    db_session.commit()
    assert first_changed is True
    assert second_changed is False
    assert db_session.query(EmployeeSnapshot).count() == 1
 def test_upsert_employee_saves_publications_and_reuses_existing_rows(db_session):
    first_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    second_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add_all([first_run, second_run])
    db_session.commit()
    parsed = _parsed_employee("published")
    parsed["sections"] = [
        {
            "type": "publications",
            "publications": [
                {
                    "id": "888959076",
                    "publication_id": "888959076",
                    "title": "Detailed Publication",
                    "year": 2023,
                    "publication_type": "ARTICLE",
                    "language": "ru",
                    "status": 1,
                    "url": "https://publications.hse.ru/view/888959076",
                    "doi_url": "https://doi.org/10.1/test",
                    "citation_text": "Detailed citation",
                    "annotation": {"ru": "Аннотация"},
                    "description": {"main": "Detailed citation"},
                    "authors": [{"id": "1", "title_ru": "Автор"}],
                    "raw_data": {"id": "888959076", "title": "Detailed Publication"},
                }
            ],
        }
    ]
    employee, _ = _upsert_employee(db_session, first_run, parsed)
    db_session.commit()
    _upsert_employee(db_session, second_run, _parsed_employee_with_publication("published"))
    db_session.commit()
    publications = db_session.query(EmployeePublication).filter_by(employee_id=employee.id).all()
    assert len(publications) == 1
    assert publications[0].doi_url == "https://doi.org/10.1/test"
    assert publications[0].authors == [{"id": "1", "title_ru": "Автор"}]
 def test_upsert_employee_records_publication_errors_without_failing_employee(monkeypatch, db_session):
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add(run)
    db_session.commit()
    def broken_sync(*_args, **_kwargs):
        raise RuntimeError("boom")
    monkeypatch.setattr("app.services.crawler._sync_employee_publications", broken_sync)
    employee, changed = _upsert_employee(db_session, run, _parsed_employee_with_publication("error-safe"))
    db_session.commit()
    assert changed is True
    assert employee.full_name == "Same Person"
    assert db_session.query(Employee).filter_by(profile_key="staff:error-safe").one()
    error = db_session.query(CrawlError).one()
    assert "публикации" in error.message.lower()
 def test_upsert_employee_saves_news_links_and_reuses_existing_rows(db_session):
    first_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    second_run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add_all([first_run, second_run])
    db_session.commit()
    employee, _ = _upsert_employee(db_session, first_run, _parsed_employee_with_news("news-person"))
    db_session.commit()
    _upsert_employee(db_session, second_run, _parsed_employee_with_news("news-person"))
    db_session.commit()
    news_links = db_session.query(EmployeeNewsLink).filter_by(employee_id=employee.id).all()
    assert len(news_links) == 1
    assert news_links[0].title == "News Title"
    assert news_links[0].url == "https://www.hse.ru/news/1.html"
    assert news_links[0].published_year == 2026
 def test_upsert_employee_records_news_errors_without_failing_employee(monkeypatch, db_session):
    run = CrawlRun(source_url="https://miem.hse.ru/persons", status="running")
    db_session.add(run)
    db_session.commit()
    def broken_sync(*_args, **_kwargs):
        raise RuntimeError("boom")
    monkeypatch.setattr("app.services.crawler._sync_employee_news_links", broken_sync)
    employee, changed = _upsert_employee(db_session, run, _parsed_employee_with_news("news-error-safe"))
    db_session.commit()
    assert changed is True
    assert employee.full_name == "Same Person"
    assert db_session.query(Employee).filter_by(profile_key="staff:news-error-safe").one()
    error = db_session.query(CrawlError).one()
    assert "новости" in error.message.lower()
 def test_checksum_changes_when_widget_data_changes():
    base = _parsed_employee("widgets")
    changed = _parsed_employee("widgets")
    changed["sections"] = [
        {
            "type": "publications",
            "publications": [{"id": "1", "title": "New publication"}],
        }
    ]
    assert _checksum(base) != _checksum(changed)
 def test_checksum_ignores_date_dependent_experience_text():
    first = _parsed_employee("experience")
    second = _parsed_employee("experience")
    first["sections"] = [{"raw_text": "Стаж работы в НИУ ВШЭ: 5 лет"}]
    second["sections"] = [{"raw_text": "Стаж работы в НИУ ВШЭ: 6 лет"}]
    assert _checksum(first) == _checksum(second)
 def _parsed_employee(profile_id: str) -> dict:
    return {
        "source_url": f"https://www.hse.ru/staff/{profile_id}",
        "profile_type": "staff",
        "profile_id": profile_id,
        "full_name": "Same Person",
        "tabs": [],
        "sections": [],
        "parser_version": "0.6.0",
        "_html": "<html></html>",
    }
 def _parsed_employee_with_publication(profile_id: str) -> dict:
    parsed = _parsed_employee(profile_id)
    parsed["sections"] = [
        {
            "type": "publications",
            "publications": [
                {
                    "id": "888959076",
                    "publication_id": "888959076",
                    "title": "Detailed Publication",
                    "year": 2023,
                    "publication_type": "ARTICLE",
                    "language": "ru",
                    "status": 1,
                    "url": "https://publications.hse.ru/view/888959076",
                    "doi_url": "https://doi.org/10.1/test",
                    "citation_text": "Detailed citation",
                    "annotation": {"ru": "Аннотация"},
                    "description": {"main": "Detailed citation"},
                    "authors": [{"id": "1", "title_ru": "Автор"}],
                    "raw_data": {"id": "888959076", "title": "Detailed Publication"},
                }
            ],
        }
    ]
    return parsed
 def _parsed_employee_with_news(profile_id: str) -> dict:
    parsed = _parsed_employee(profile_id)
    parsed["sections"] = [
        {
            "type": "news",
            "news_links": [
                {
                    "title": "News Title",
                    "url": "https://www.hse.ru/news/1.html",
                    "summary": "News summary",
                    "published_at": "2026-04-28T00:00:00+00:00",
                    "published_year": 2026,
                    "raw_data": {"title": "News Title", "url": "https://www.hse.ru/news/1.html"},
                }
            ],
        }
    ]
    return parsed
--- a/tests/test_dataset_versions.py
+++ b/tests/test_dataset_versions.py
@@ -0,0 +1,88 @@
 from datetime import datetime, timezone
 from app.models import Employee
 from app.services.dataset_versions import get_or_create_current_version, sync_employees_payload
 def _employee(profile_key: str, checksum: str, *, status: str = "active") -> Employee:
    return Employee(
        profile_key=profile_key,
        profile_type=profile_key.split(":", 1)[0],
        profile_id=profile_key.split(":", 1)[1],
        canonical_url=f"https://www.hse.ru/{profile_key}",
        full_name=profile_key,
        status=status,
        first_seen_at=datetime.now(timezone.utc),
        last_seen_at=datetime.now(timezone.utc),
        current_data={"profile_key": profile_key},
        current_checksum=checksum,
    )
 def test_dataset_version_hash_is_stable_for_same_employee_state(db_session):
    db_session.add(_employee("staff:alpha", "a" * 64))
    db_session.commit()
    first = get_or_create_current_version(db_session)
    db_session.commit()
    second = get_or_create_current_version(db_session)
    assert second.id == first.id
    assert second.hash == first.hash
    assert second.employee_count == 1
 def test_dataset_version_hash_changes_when_employee_checksum_changes(db_session):
    employee = _employee("staff:alpha", "a" * 64)
    db_session.add(employee)
    db_session.commit()
    first = get_or_create_current_version(db_session)
    db_session.commit()
    employee.current_checksum = "b" * 64
    db_session.commit()
    second = get_or_create_current_version(db_session)
    assert second.hash != first.hash
    assert second.previous_hash == first.hash
 def test_sync_employees_diff_spans_multiple_intermediate_versions(db_session):
    alpha = _employee("staff:alpha", "a" * 64)
    db_session.add(alpha)
    db_session.commit()
    first = get_or_create_current_version(db_session)
    db_session.commit()
    beta = _employee("staff:beta", "b" * 64)
    db_session.add(beta)
    db_session.commit()
    get_or_create_current_version(db_session)
    db_session.commit()
    alpha.current_checksum = "c" * 64
    alpha.current_data = {"profile_key": "staff:alpha", "changed": True}
    db_session.commit()
    payload = sync_employees_payload(db_session, client_hash=first.hash, include_data=False)
    assert payload["mode"] == "delta"
    assert [item["profile_key"] for item in payload["changes"]["added"]] == ["staff:beta"]
    assert [item["profile_key"] for item in payload["changes"]["updated"]] == ["staff:alpha"]
    assert payload["changes"]["dismissed"] == []
    assert payload["changes"]["removed"] == []
 def test_sync_employees_reports_dismissed_as_tombstone(db_session):
    alpha = _employee("staff:alpha", "a" * 64)
    db_session.add(alpha)
    db_session.commit()
    first = get_or_create_current_version(db_session)
    db_session.commit()
    alpha.status = "dismissed"
    db_session.commit()
    payload = sync_employees_payload(db_session, client_hash=first.hash, include_data=False)
    assert payload["changes"]["dismissed"][0]["profile_key"] == "staff:alpha"
    assert payload["changes"]["dismissed"][0]["status"] == "dismissed"
--- a/tests/test_db_schema.py
+++ b/tests/test_db_schema.py
@@ -0,0 +1,115 @@
 from sqlalchemy import create_engine, inspect, text
 from app.db import _ensure_runtime_schema
 def test_runtime_schema_adds_skipped_count_to_existing_crawl_runs_table(monkeypatch):
    engine = create_engine("sqlite:///:memory:")
    with engine.begin() as connection:
        connection.execute(
            text(
                """
                CREATE TABLE crawl_runs (
                  id INTEGER PRIMARY KEY,
                  source_url TEXT NOT NULL,
                  status VARCHAR(32) NOT NULL DEFAULT 'running',
                  found_count INTEGER NOT NULL DEFAULT 0,
                  parsed_count INTEGER NOT NULL DEFAULT 0
                )
                """
            )
        )
    monkeypatch.setattr("app.db.engine", engine)
    _ensure_runtime_schema()
    columns = {column["name"] for column in inspect(engine).get_columns("crawl_runs")}
    assert "skipped_count" in columns
 def test_runtime_schema_creates_employee_publications_table_when_employees_exist(monkeypatch):
    engine = create_engine("sqlite:///:memory:")
    with engine.begin() as connection:
        connection.execute(
            text(
                """
                CREATE TABLE employees (
                  id INTEGER PRIMARY KEY,
                  profile_key VARCHAR(255) NOT NULL UNIQUE,
                  canonical_url TEXT NOT NULL,
                  status VARCHAR(32) NOT NULL DEFAULT 'active',
                  first_seen_at DATETIME NOT NULL,
                  last_seen_at DATETIME NOT NULL,
                  created_at DATETIME NOT NULL,
                  updated_at DATETIME NOT NULL
                )
                """
            )
        )
        connection.execute(
            text(
                """
                CREATE TABLE crawl_runs (
                  id INTEGER PRIMARY KEY,
                  source_url TEXT NOT NULL,
                  status VARCHAR(32) NOT NULL DEFAULT 'running',
                  found_count INTEGER NOT NULL DEFAULT 0,
                  parsed_count INTEGER NOT NULL DEFAULT 0,
                  skipped_count INTEGER NOT NULL DEFAULT 0
                )
                """
            )
        )
    monkeypatch.setattr("app.db.engine", engine)
    _ensure_runtime_schema()
    _ensure_runtime_schema()
    inspector = inspect(engine)
    assert "employee_publications" in inspector.get_table_names()
    columns = {column["name"] for column in inspector.get_columns("employee_publications")}
    assert {"employee_id", "publication_id", "doi_url", "authors", "raw_data", "source_hash"}.issubset(columns)
 def test_runtime_schema_creates_employee_news_links_table_when_employees_exist(monkeypatch):
    engine = create_engine("sqlite:///:memory:")
    with engine.begin() as connection:
        connection.execute(
            text(
                """
                CREATE TABLE employees (
                  id INTEGER PRIMARY KEY,
                  profile_key VARCHAR(255) NOT NULL UNIQUE,
                  canonical_url TEXT NOT NULL,
                  status VARCHAR(32) NOT NULL DEFAULT 'active',
                  first_seen_at DATETIME NOT NULL,
                  last_seen_at DATETIME NOT NULL,
                  created_at DATETIME NOT NULL,
                  updated_at DATETIME NOT NULL
                )
                """
            )
        )
        connection.execute(
            text(
                """
                CREATE TABLE crawl_runs (
                  id INTEGER PRIMARY KEY,
                  source_url TEXT NOT NULL,
                  status VARCHAR(32) NOT NULL DEFAULT 'running',
                  found_count INTEGER NOT NULL DEFAULT 0,
                  parsed_count INTEGER NOT NULL DEFAULT 0,
                  skipped_count INTEGER NOT NULL DEFAULT 0
                )
                """
            )
        )
    monkeypatch.setattr("app.db.engine", engine)
    _ensure_runtime_schema()
    _ensure_runtime_schema()
    inspector = inspect(engine)
    assert "employee_news_links" in inspector.get_table_names()
    columns = {column["name"] for column in inspector.get_columns("employee_news_links")}
    assert {"employee_id", "title", "url", "summary", "published_at", "published_year", "source_hash", "raw_data"}.issubset(columns)
--- a/tests/test_employee_detail_template.py
+++ b/tests/test_employee_detail_template.py
@@ -9,7 +9,27 @@ def test_employee_detail_template_is_human_readable():
    assert ">Tabs<" not in template
    assert "contacts.items" not in template
    assert "contacts.contact_items" in template
    assert "section.items" not in template
    assert "section.list_items" in template
    assert "Основная информация" in template
    assert "Контакты" in template
    assert "В новостях" in template
    assert "employee_view.news_links" in template
    assert "news.summary" in template
    assert "Разделы профиля" in template
-    assert "Snapshots" in template
+    assert "graduation_theses" in template
    assert "Год защиты" in template
    assert "Parser version" not in template
    assert "First seen" not in template
    assert "Last seen" not in template
    assert "Dismissed at" not in template
    assert "Profile type" not in template
    assert "Profile ID" not in template
    assert "Впервые найден" in template
    assert "Последний раз найден" in template
    assert "Дата увольнения" in template
    assert "Тип профиля" in template
    assert "ID профиля" in template
    assert "Обновить данные" in template
    assert 'action="/admin/employees/{{ employee.id }}/refresh"' in template
    assert "Снапшоты" in template
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -1,9 +1,124 @@
 from bs4 import BeautifulSoup
-from app.parser.profile import extract_person_tabs
+from app.parser.profile import enrich_sections_from_hse_widgets, extract_person_tabs, extract_sections
 from app.parser.profile_url import normalize_profile_url, parse_profile_identity
 class FakeResponse:
    def __init__(self, payload):
        self.payload = payload
    def raise_for_status(self):
        return None
    def json(self):
        return self.payload
 class FakeSession:
    def __init__(self):
        self.posts = []
        self.gets = []
    def post(self, url, **kwargs):
        self.posts.append((url, kwargs))
        return FakeResponse(
            {
                "status": "ok",
                "result": {
                    "more": False,
                    "total": 1,
                    "items": [
                        {
                            "id": "888959076",
                            "type": "ARTICLE",
                            "title": "Дублирование пакетов",
                            "year": 2023,
                            "language": {"name": "ru"},
                            "status": 1,
                            "authorsByType": {
                                "author": [
                                    {
                                        "id": "568398853",
                                        "href": "/org/persons/568398853",
                                        "title": {"ru": "Левицкий И. А.", "en": ""},
                                        "reverseTitle": {"ru": "И. А. Левицкий", "en": ""},
                                    }
                                ]
                            },
                            "description": {"short": {"ru": "Информационные процессы. 2023."}},
                            "annotation": {"ru": "<p>Русская аннотация</p>"},
                            "documents": {"DOI": {"href": "https://doi.org/10.1/test"}},
                        }
                    ],
                },
            }
        )
    def get(self, url, **kwargs):
        self.gets.append((url, kwargs))
        return FakeResponse(
            {
                "lang": "ru",
                "success": True,
                "data": [
                    {
                        "id": 1045750164,
                        "year": 2025,
                        "level": "Бакалавриат",
                        "title": "Аппаратно-программный комплекс защиты сети",
                        "rating": 8,
                        "student": "Лесняк Владислав Евгеньевич",
                        "learnProgram": {"title": "Информатика и вычислительная техника", "url": "https://hse.ru/ba/isct/"},
                        "orgUnit": {"title": "МИЭМ", "url": "https://www.hse.ru/org/url/59315150"},
                        "supervisors": [{"url": "https://www.hse.ru/org/persons/803294906", "name": "Борисов Сергей Петрович"}],
                    }
                ],
            }
        )
 class GroupedPublicationsSession(FakeSession):
    def post(self, url, **kwargs):
        self.posts.append((url, kwargs))
        return FakeResponse(
            {
                "status": "ok",
                "result": {
                    "more": False,
                    "total": 1,
                    "groupType": 2,
                    "items": {
                        "year": {
                            "header": {"ru": "по году", "en": "by year"},
                            "criteria": {"year": []},
                            "items": {
                                "2011": [
                                    {
                                        "id": "146366790",
                                        "type": "ARTICLE",
                                        "title": "Развитие теории самосогласованного поля",
                                        "year": 2011,
                                        "description": {"short": {"ru": "Журнал физической химии 2011."}},
                                    }
                                ],
                                "2012": [
                                    {
                                        "id": "146367323",
                                        "type": "ARTICLE",
                                        "title": "Self-consistent field theory investigation",
                                        "year": 2012,
                                        "description": {"short": {"en": "Russian Journal of Physical Chemistry A 2012."}},
                                    }
                                ],
                            },
                        }
                    },
                },
            }
        )
 def test_normalize_profile_url_supports_staff_and_org_persons():
    assert normalize_profile_url("/staff/avsergeev#sci") == "https://www.hse.ru/staff/avsergeev"
    assert normalize_profile_url("https://www.hse.ru/org/persons/123/") == "https://www.hse.ru/org/persons/123"
@@ -26,3 +141,136 @@ def test_extract_person_tabs_prefers_person_menu_addition():
    assert [tab["title"] for tab in tabs] == ["Домашняя страница", "Публикации"]
    assert tabs[1]["href"] == "https://www.hse.ru/staff/avsergeev#sci"
 def test_enrich_sections_from_hse_widgets_loads_publications_and_vkr():
    soup = BeautifulSoup(
        """
        <script src="/n/stat/publications/dist-w/publs.js" data-author="568398853" data-widget-name="AuthorSearch"></script>
        <script src="/n/stat/vkr/app.js" data-api-url="/n/vkr/api/" data-person-id="803294906"></script>
        """,
        "html.parser",
    )
    session = FakeSession()
    sections = enrich_sections_from_hse_widgets(
        session,
        soup,
        "https://www.hse.ru/org/persons/803294906",
        {"User-Agent": "test"},
        10,
        [],
    )
    publications = next(section for section in sections if section["type"] == "publications")
    theses = next(section for section in sections if section["type"] == "graduation_theses")
    assert publications["publications_count"] == 1
    assert publications["publications"][0]["url"] == "https://publications.hse.ru/view/888959076"
    assert publications["publications"][0]["doi_url"] == "https://doi.org/10.1/test"
    assert publications["publications"][0]["annotation"] == {"ru": "Русская аннотация"}
    assert publications["publications"][0]["authors"][0]["is_current_employee"] is True
    assert theses["theses_count"] == 1
    assert theses["theses"][0]["student"] == "Лесняк Владислав Евгеньевич"
    assert theses["theses"][0]["project_url"] == "https://www.hse.ru/edu/vkr/1045750164"
    assert session.posts[0][0] == "https://publications.hse.ru/api/searchPubs"
    assert session.gets[0][1]["params"] == {"supervisorId": "803294906"}
 def test_enrich_sections_from_hse_widgets_loads_grouped_publications():
    soup = BeautifulSoup(
        """
        <script src="/n/stat/publications/dist-w/publs.js" data-author="133709486" data-widget-name="AuthorSearch"></script>
        """,
        "html.parser",
    )
    session = GroupedPublicationsSession()
    sections = enrich_sections_from_hse_widgets(
        session,
        soup,
        "https://www.hse.ru/org/persons/133709486",
        {"User-Agent": "test"},
        10,
        [],
    )
    publications = next(section for section in sections if section["type"] == "publications")
    assert publications["publications_count"] == 2
    assert [item["id"] for item in publications["publications"]] == ["146366790", "146367323"]
    assert publications["publications"][0]["url"] == "https://publications.hse.ru/view/146366790"
    assert publications["publications"][1]["url"] == "https://publications.hse.ru/view/146367323"
 def test_news_heading_with_publications_word_does_not_absorb_widget_publications():
    soup = BeautifulSoup(
        """
        <h2>Статья профессора МИЭМ вошла в число самых популярных публикаций на портале SpringerLink</h2>
        <div class="post__text">
          <p>Первоначально статья профессора вышла в российском журнале.</p>
        </div>
        <script src="/n/stat/publications/dist-w/publs.js" data-author="133709486" data-widget-name="AuthorSearch"></script>
        """,
        "html.parser",
    )
    session = FakeSession()
    sections = extract_sections(soup, "https://www.hse.ru/org/persons/133709486")
    sections = enrich_sections_from_hse_widgets(
        session,
        soup,
        "https://www.hse.ru/org/persons/133709486",
        {"User-Agent": "test"},
        10,
        sections,
    )
    assert sections[0]["type"] == "paragraphs"
    assert sections[0]["title"].startswith("Статья профессора")
    publications = [section for section in sections if section["type"] == "publications"]
    assert len(publications) == 1
    assert publications[0]["title"] == "Публикации и исследования"
    assert publications[0]["publications_count"] == 1
 def test_extract_sections_parses_employee_news_links():
    soup = BeautifulSoup(
        """
        <div class="b-person-data posts hidden printable" data-tab="press_links_news" tab-node="press_links_news">
          <div class="post f8">
            <div class="post__extra">
              <div class="post-meta">
                <div class="post-meta__date">
                  <div class="post-meta__day">28</div>
                  <div class="post-meta__month">апр.</div>
                  <div class="post-meta__year">2026</div>
                </div>
              </div>
            </div>
            <div class="post__content">
              <h2 class="first_child"><a class="link" href="/news/edu/1153850518.html">Как финал ВсОШ формирует кадры</a></h2>
              <div class="post__text"><p class="with-indent">Краткое описание новости.</p></div>
            </div>
          </div>
          <div class="post f8">
            <div class="post__content">
              <h2><a href="https://miem.hse.ru/news/1123589375.html">Партнер магистратуры</a></h2>
            </div>
          </div>
        </div>
        """,
        "html.parser",
    )
    sections = extract_sections(soup, "https://www.hse.ru/staff/avsergeev")
    assert len(sections) == 1
    news = sections[0]
    assert news["type"] == "news"
    assert news["news_count"] == 2
    assert news["news_links"][0]["title"] == "Как финал ВсОШ формирует кадры"
    assert news["news_links"][0]["url"] == "https://www.hse.ru/news/edu/1153850518.html"
    assert news["news_links"][0]["summary"] == "Краткое описание новости."
    assert news["news_links"][0]["published_at"] == "2026-04-28T00:00:00+00:00"
    assert news["news_links"][0]["published_year"] == 2026
Author	SHA1	Message	Date
admin	cd46f6d361	Merge pull request 'feat: add employee news links parsing and storage' (#28 ) from feature/employee-news-links into main Reviewed-on: #28	2026-05-22 15:52:23 +00:00
Anton	4d2a071ec0	feat: add employee news links parsing and storage	2026-05-22 18:50:25 +03:00
admin	680ac6e980	Merge pull request 'feat: add detailed employee publications storage and MCP docs' (#27 ) from feature/employee-publications-db into main Reviewed-on: #27	2026-05-15 14:40:29 +00:00
Anton	dbaf3af468	feat: add detailed employee publications storage and MCP docs	2026-05-15 17:39:41 +03:00
admin	2819a6c334	Merge pull request 'fix: add runtime schema guard for skipped count' (#26 ) from fix/runtime-schema-skipped-count into main Reviewed-on: #26	2026-05-14 10:30:06 +00:00
Anton	41fb54c5e7	fix: add runtime schema guard for skipped count	2026-05-14 13:29:27 +03:00
admin	4b91effee3	Merge pull request 'feat: adds crawl resource cache' (#25 ) from feature/crawl-resource-cache into main Reviewed-on: #25	2026-05-14 09:27:06 +00:00
Anton	6724b3f369	feat: adds crawl resource cache	2026-05-14 12:21:44 +03:00
admin	1791ad8d4d	Merge pull request 'chore: adds additional mcp-description file to gitignore' (#24 ) from docs/mcp-description into main Reviewed-on: #24	2026-05-14 08:52:29 +00:00
Anton	993888b003	chore: adds additional mcp-description file to gitignore	2026-05-14 11:51:51 +03:00
admin	5180b89b81	Merge pull request 'feat: add dataset checkpoint sync for MCP' (#23 ) from feature/dataset-version-sync into main Reviewed-on: #23	2026-05-14 08:01:26 +00:00
Anton	29451ccee1	feat: add dataset checkpoint sync for MCP	2026-05-14 11:00:46 +03:00
admin	a3ff9c6e9c	Merge pull request 'fix: separate news from publications and add employee refresh' (#22 ) from fix/publications-news-refresh into main Reviewed-on: #22	2026-05-13 13:12:06 +00:00
Anton	8e19dc9f35	fix: separate news from publications and add employee refresh	2026-05-13 16:11:13 +03:00
admin	5b9d71426d	Merge pull request 'fix: support grouped HSE publication API responses' (#21 ) from fix/grouped-publications-parser into main Reviewed-on: #21	2026-05-13 09:46:48 +00:00
Anton	efa7192e45	fix: support grouped HSE publication API responses	2026-05-13 12:46:07 +03:00
admin	b27d613143	Merge pull request 'fix: remove mcp-auth from yml-file' (#20 ) from fix/remove-mcp-auth-compose into main Reviewed-on: #20	2026-05-08 09:33:17 +00:00
Anton	a1ab1c0319	fix: remove mcp-auth from yml-file	2026-05-08 12:32:40 +03:00
admin	0b4e04544d	Merge pull request 'fix: remove MCP application-level authorization' (#19 ) from fix/remove-mcp-auth into main Reviewed-on: #19	2026-05-08 09:15:18 +00:00
Anton	7593a460c7	fix: remove MCP application-level authorization	2026-05-08 12:14:19 +03:00
admin	a4e7388bcf	Merge pull request 'fix: use direct onclick handlers for run rows' (#18 ) from fix/direct-run-row-click-handler into main Reviewed-on: #18	2026-05-07 15:25:26 +00:00
Anton	ac319b3ee5	fix: use direct onclick handlers for run rows	2026-05-07 18:23:14 +03:00
admin	8e004c46ef	Merge pull request 'fix: move run navigation from id link to table row' (#17 ) from fix/run-row-link-target into main Reviewed-on: #17	2026-05-07 14:04:07 +00:00
Anton	7fa28e8e47	fix: move run navigation from id link to table row	2026-05-07 17:03:36 +03:00
admin	1c4ad0bd9d	Merge pull request 'fix: make run rows clickable and limit dashboard runs' (#16 ) from fix/dashboard-run-row-clicks into main Reviewed-on: #16	2026-05-07 13:24:25 +00:00
Anton	52c5cc1af1	fix: make run rows clickable and limit dashboard runs	2026-05-07 16:23:39 +03:00
admin	c97ced52b4	Merge pull request 'feat: make dashboard metrics and run rows clickable' (#15 ) from feature/dashboard-clickable-metrics into main Reviewed-on: #15	2026-05-07 06:36:27 +00:00
Anton	deaecd8d3b	feat: make dashboard metrics and run rows clickable	2026-05-07 09:35:44 +03:00
admin	e4d4271e32	Merge pull request 'feat: track crawl run employee changes and verify dismissals' (#14 ) from feature/crawl-run-change-details into main Reviewed-on: #14	2026-05-06 12:14:51 +00:00
Anton	d0459a2c30	feat: track crawl run employee changes and verify dismissals	2026-05-06 15:13:15 +03:00
Anton	2331c7a28d	chore: removes sensitive data from docker file	2026-04-29 16:16:06 +03:00
admin	064c34ea32	Merge pull request 'feat: adds oauth server to docker' (#13 ) from feature/add-oauth-server into main Reviewed-on: #13	2026-04-29 12:59:55 +00:00
Anton	6a98ae4246	feat: adds oauth server to docker	2026-04-29 15:59:18 +03:00
admin	a6f2883091	Merge pull request 'feat: requires OAuth-only auth mode for MCP agents' (#12 ) from feature/mcp-oauth-oidc into main Reviewed-on: #12	2026-04-29 12:22:25 +00:00
Anton	d20b4f396b	feat: requires OAuth-only auth mode for MCP agents	2026-04-29 15:08:18 +03:00
admin	c7027bb503	Merge pull request 'feat: adds OAuth/OIDC authentication for MCP' (#11 ) from feature/mcp-oauth-oidc into main Reviewed-on: #11	2026-04-29 11:35:00 +00:00
Anton	ad0b15cc6e	feat: adds OAuth/OIDC authentication for MCP	2026-04-29 14:33:29 +03:00
admin	af864ecb44	Merge pull request 'fix: enrich HSE profile parsing with publications and theses' (#10 ) from fix/hse-profile-parser-publications-vkr-pagination into main Reviewed-on: #10	2026-04-29 11:16:17 +00:00
Anton	cc9481fc6c	fix: enrich HSE profile parsing with publications and theses	2026-04-29 14:15:29 +03:00
admin	cf578ce699	Merge pull request 'fix: allow empty CRAWL_LIMIT env value' (#9 ) from fix/empty-crawl-limit-env into main Reviewed-on: #9	2026-04-29 09:50:34 +00:00
Anton	765efa1a1c	fix: allow empty CRAWL_LIMIT env value	2026-04-29 12:49:58 +03:00
admin	86330885e3	Merge pull request 'fix: localize admin UI and simplify employees navigation' (#8 ) from fix/admin-russian-ux-cleanup into main Reviewed-on: #8	2026-04-29 09:39:42 +00:00
Anton	866e2b44d5	fix: localize admin UI and simplify employees navigation	2026-04-29 12:39:16 +03:00
admin	f411de740e	Merge pull request 'fix: avoid Jinja dict method collisions in admin templates' (#7 ) from fix/jinja-dict-method-collisions into main Reviewed-on: #7	2026-04-29 09:12:13 +00:00
Anton	cdfbb26875	fix: avoid Jinja dict method collisions in admin templates	2026-04-29 12:11:16 +03:00
admin	5eaad38076	Merge pull request 'fix: avoid Jinja dict items collision in employee card' (#6 ) from fix/employee-card-contact-items into main Reviewed-on: #6	2026-04-29 08:35:13 +00:00