支持Firestore向RDB迁移的技术

2 年 ago

科, 颖

4 minutes

在比特鑰（Bitkey）過去主要使用Firestore作為主要的DB，但目前正在推進轉向關聯式數據庫（RDB）。

主要的迁移原因是业务领域的扩张，以及Firestore在搜索性能方面的不足成为了制约因素。

过去我们一直使用Elasticsearch、Algolia和用于搜索目的的关系型数据库等来补充搜索功能，但现在我们考虑将主要数据库设为关系型数据库，以便基本上无需使用附加的数据库来运作。

这次我们将介绍在办公区域的产品workhub上推进转移的实施方式。

政策

目前，我们选择同步地将数据同步到关系型数据库，而不使用Pubsub等异步的方法。

这是为了强调写入后的一致性。

转移的单位基本上是按照收集的单元进行的（除非涉及到某些详细规格的整理）。

在RDB方面，相应的表为1到多个表。

作为流程如下：

停止向Firestore写入数据并废弃。

构成

大致上述是这样的构成

写 (xiě)

我们将执行将Firestore中的添加、更新和删除同步到关系数据库（RDB）的ETL过程。

据我所知，并没有太常见的解决方案。因此，我们考虑了支持Firestore的Google Cloud的Dataflow。

一般而言，常用的選擇是使用 Apache Beam SDK 支持的 Java/Python/Go 中的一種。其中，Google Cloud 提供的實現主要以 Java 為主。由於 Firestore 和 RDB 的結構不同，因此需要實現相對複雜的轉換處理。主要處理 Firestore 的團隊通常使用 Node.js（TypeScript）。

考虑到这一点，我们决定不采用Dataflow，因为如果要写Java代码，团队内部将需要花费很多时间来扩展。另外，还存在与我们方针中所述的同步延迟相关的担忧。

由于找不到其他合适的候选项，我们决定自己进行ETL工作。

如何将同步处理嵌入

如方針所述，我們希望進行同步ETL，但在這種情況下，必須確保ETL在所有的更新操作中都被執行。

在每个地方明确地添加数据库写入是很麻烦且可能会出现遗漏的状态。

在Workhub中，我们最初使用了一个基于操作的框架来进行对Firestore的写入操作。

每个集合都有根据OpenAPI规范定义的Spec所生成的类型和函数集，最终将构建以下操作，并将其输入到Firebase SDK以执行写入操作。

export type FirestoreOperation = {
  path: string;
  operation:
    | {type: 'overwrite'; data: {id: string} & Record<string, any>}
    | {type: 'merge'; data: {id: string} & Record<string, any>}
    | {type: 'delete'; id: string};
};

我們在這個寫入過程中加入了ETL操作，以原始操作作為基礎，構建了對RDB的ETL處理，以實現透明的同步。

Load: Prismaで書き込み

正在进行。

如何处理太多的字段?

Firestore是一种面向文档的数据库，可以灵活地进行嵌套和数组等操作。虽然在关系型数据库中，大多数情况下会将其拆分为子表格，但在某些情况下，将数据映射到表格可能会变得困难。

在这种情况下，我们将其转换为Json类型，但是我们想限制其内容的类型。

由于还有许多其他需要进行验证的用例，所以我们在公司内部正在实施一个类似于zod模式的生成器。

在Prisma模式上


/// @zod.object({
/// ...
/// ...
/// })
complecatedField Json?

我正在生成Zod模式。

由于只是一系列字符，所以语法无法生效，这是一个难题。

2. 阅读（后端）

Firestore提供了直接在浏览器和移动应用程序中进行查询的功能。这使得我们可以根据特定用例条件灵活地进行查询。

为了能够直接移植这一经验，我们决定采用GraphQL。

有几个官方的解决方案将Prisma作为数据库的GraphQL服务器进行了总结。

在尝试其中几个选择之后，我们选择了体验最好的Pothos。

一些情况下，如公式示例所示，可以将Prisma字段直接映射到GraphQL，这样做相对容易；而在需要特殊映射到GraphQL时，也可以相对灵活地进行定义。

当直接映射Prisma的字段时，如果每一个字段都需要逐一定义的话会有些繁琐，但是经过仔细审查后，发现由于Copilot的帮助，这样简单的映射已经变得相当容易了。

// Create an object type based on a prisma model
// without providing any custom type information
builder.prismaObject('User', {
  fields: (t) => ({
    // expose fields from the database
    id: t.exposeID('id'),
    email: t.exposeString('email'),
    bio: t.string({
      // automatically load the bio from the profile
      // when this field is queried
      select: {
        profile: {
          select: {
            bio: true,
          },
        },
      },
      // user will be typed correctly to include the
      // selected fields from above
      resolve: (user) => user.profile.bio,
    }),
    // Load posts as list field.
    posts: t.relation('posts', {
      args: {
        oldestFirst: t.arg.boolean(),
      },
      // Define custom query options that are applied when
      // loading the post relation
      query: (args, context) => ({
        orderBy: {
          createdAt: args.oldestFirst ? 'asc' : 'desc',
        },
      }),
    }),
    // creates relay connection that handles pagination
    // using prisma's built in cursor based pagination
    postsConnection: t.relatedConnection('posts', {
      cursor: 'id',
    }),
  }),
});

引用：https://pothos-graphql.dev/docs/plugins/prisma#示例

问：请以中文为母语进行解释

回答：请参考以下链接：https://pothos-graphql.dev/docs/plugins/prisma#示例

灵活的查询参数

在Firestore中，可以使用”in”或”array-contains”等查询条件，但GraphQL也通过定义查询参数来实现类似的功能。

例如，针对身份证的搜索和筛选

import {builder} from '@/graphql/common/builder';

export const IdInput = builder.inputType('IdInput', {
  fields: t => ({
    equals: t.field({type: 'String', required: false}),
    not: t.field({type: 'String', required: false}),
    in: t.field({type: ['String'], required: false}),
    notIn: t.field({type: ['String'], required: false}),
  }),
});
export type WhereIdInput = {
  equals?: string | null | undefined;
  not?: string | null | undefined;
  in?: string[] | undefined | null;
  notIn?: string[] | undefined | null;
};

export const prismaWhereIdNullable = (input: WhereIdInput) => {
  return input;
};

export const prismaWhereId = (input: WhereIdInput) => {
  return {
    equals: input.equals ?? undefined,
    not: input.not ?? undefined,
    in: input.in ?? undefined,
    notIn: input.notIn ?? undefined,
  };
};

搜索字符串

import {builder} from '@/graphql/common/builder';

export const StringInput = builder.inputType('StringInput', {
  fields: t => ({
    equals: t.field({type: 'String', required: false}),
    not: t.field({type: 'String', required: false}),
    // 使う時はindexに要注意
    contains: t.field({type: 'String', required: false}),
    notContains: t.field({type: 'String', required: false}),
    startsWith: t.field({type: 'String', required: false}),
    // 使う時はindexに要注意
    endsWith: t.field({type: 'String', required: false}),
  }),
});

在字符串列中进行搜索

import {builder} from '@/graphql/common/builder';

export const StringListInput = builder.inputType('StringListInput', {
  fields: t => ({
    equals: t.field({type: ['String'], required: false}),
    has: t.field({type: 'String', required: false}),
    hasEvery: t.field({type: ['String'], required: false}),
    hasSome: t.field({type: ['String'], required: false}),
    isEmpty: t.field({type: 'Boolean', required: false}),
  }),
});
export type WhereStringListInput = {
  equals?: string[] | null | undefined;
  has?: string | null | undefined;
  hasEvery?: string[] | undefined | null;
  hasSome?: string[] | undefined | null;
  isEmpty?: boolean | undefined | null;
};

export const prismaWhereStringListNullable = (input: WhereStringListInput) => {
  return input;
};

export const prismaWhereStringList = (input: WhereStringListInput) => {
  return {
    equals: input.equals ?? undefined,
    has: input.has ?? undefined,
    hasEvery: input.hasEvery ?? undefined,
    hasSome: input.hasSome ?? undefined,
    isEmpty: input.isEmpty ?? undefined,
  };
};

我正在实现针对类似InputType的实用程序。

在使用时，请按照以下方式进行。

args: {
  id: t.arg({type: IdInput}),
},
resolve: async (query, args) => {
  return prisma.user.findMany({
    ...query,
	where: {
	  id: args.id ? prismaWhereId(args.id) : undefined,
    }
  })
}

由于Prisma提供了类似于hasEvery和startsWith的人性化接口，因此只需要根据其定义InputType并将其传递给where，就可以轻松地进行过滤。

考虑到Pothos公式也开始实施类似的插件，现阶段这个包还处于高度实验阶段，不建议在生产中使用，所以一旦稳定下来，我希望考虑迁移。

安全规则

在Firestore中，您可以通过安全规则来控制对资源的授权。

match /organizations/{organizationId}/spaces {
  allow read, write: if request.auth != null && request.auth.token.organizationId == organizationId
}

Pothos提供了Auth插件，可以实现类似的功能。

例如，可以对模型进行授权控制，如下所示。

export const Space = builder.prismaNode('Space', {
  authScopes: (space, context) => {
    return !!context.operator && space.organizationId === context.operator.organizationId;
  },
  // ...

可以基于字段进行设置，并灵活地进行控制。

为了更严格地进行管理，可以在PostgreSQL中使用行级安全（RLS），但目前在workhub尚未引入。

作为参考，Prisma提供了一个基于extension的在RLS中切换租户的示例。在需要时，我会考虑参考这个示例并进行评估。

其他

在Pothos之前，我们使用Apollo服务器。

由于没有特别选择Apollo server的理由，可能会考虑根据需要转移到GraphQL Yoga等。

3. 阅读（前端）

我们选择了urql作为前端的GraphQL客户端。

目前为止我们还没有使用 Mutation，在 Apollo client 和 urql 之间并没有太大的差异，但是我们选择了 urql，因为它更简单易用。

视情况而定，之后可能会考虑迁移到Apollo client。

我们使用graphql-codegen的client-preset来支持TypeScript。

可以通过访问服务器来映射GraphQL模式，但我们选择简单地通过文件进行同步，使用GitHub Actions来监测GraphQL服务器的变更并将其同步至前端代码仓库。

要输出Pothos的模式，按照以下方式进行操作。

import {printSchema, lexicographicSortSchema} from 'graphql';

const schema = builder.toSchema({});
const schemaAsString = printSchema(lexicographicSortSchema(schema));
console.log(schemaAsString);

4. 订阅

Firestore提供了通过onSnapshot函数来获取有更新时的通知机制。GraphQL也有一个名为Subscription的机制，我们正考虑引入它，但目前并没有特别需要通知的功能，所以暂时还未引入。

我有一个大致的想法：从Prisma通过Cloud Pubsub到Websocket（使用GraphQL服务器和客户端的Websocket进行订阅实现）。

需要确定如何准备Websocket服务器。虽然可以在Cloud Run上实现，但需要考虑在长时间连接中的稳定性和成本是否可行。

我认为需要考虑的事项包括等等。

最后

我们设计了组合使用RDB、自家开发的ETL和GraphQL技术，以实现平稳的迁移。

然而，更新处理时的额外开销以及需要注意GraphQL通过RDB进行查询的效率等问题也存在。

Prisma在查询效率方面具有类似于Data loader的机制，可以通过AlloyDB的Query Insights和索引推荐功能来改进表定义。

因为还有几个还没有实现的不足部分，所以我们将在推进过程中填补它们。

2023年的第11天，由@0yoyoyo负责的BitKey公司的圣诞日历即将登场。敬请期待！