高手们好, 我再来点说明 -qzh(胡子) @卡尔加里华人论坛:卡尔加里枫下论坛 The Rolia Forum of Calgary

高手们好, 我再来点说明

qzh(胡子)

1. 200万条记录都是"增量"
2. 这些增量在计算余额时不会被锁, 因为这时候这部分没有写操作(修改). 当然"表二"还是有写操作的.
3. RAID 如 Raymond 说的: TEMPTABLE -- RAID 1; LOG -- RAID 0; DATABAE -- RAID 10
4. "表三" 只是存放余额, 如果只看这个简化的问题, 不存在INDEX 的问题. "余额"的意思是: 每个客户到截止日期的各股票/基金持有量. 比如:
ID 股票ID 余额截止日期(可以不要)
------------------------------------------------------------------------------------------
100001 MS 1000 20021130
100001 NT 500 20021130
100002 ATT 1200 20021130
100002 GE 300 20021130
...........
算出的"表三"应该如此.

5. 表二的INDEX 是: ID + 股票ID no cluster

(#891887@0)
Last Updated: 2002-12-3
This post has been archived. It cannot be replied.

我有一些，不过可能不适用于你的情况，仅供参考。我们公司每个10x900万纪录的库的检索batch只用2分钟 -kkkkkkkk(Toronto123); 2002-12-2 {221} (#890611@0)

1优化SQL Server/NT配置，避免hot disk,network bottleneck.
2检查/tuning SQL 语句。
（以下程序改动较大，是下策）
3如果是使用C++或其他语言的batch,如果可能采用多进/线程， data catch 避免重复SQL检索.
4. 当然使用ODBC要小心。

多谢老吴同志 -qzh(胡子); 2002-12-3 {298} (#891116@0)
不过, 还是不太明白你们具体算的是什么? 我们是要算每个帐户的余额, 每个库有4万-5万帐户, 交易记录有200万个, 一共4个这样的库. 因为, 最终要有5个库, 共100万帐户, 交易记录起码2000万, 所以, 要找更好的方法.

能否给的具体一点, 主要是T-SQL方面和表的索引方面. 结构是没法动了. 否则全身都要动了.

先多谢了!!!

很久不发言了,这次说两句. -yangn(Raymond); 2002-12-3 {404} (#891391@0)
你的描述太笼统.什么样的计算?仅仅是modification还是query + modification ? index是两面刃, 你可以打开execution plan看看相关的计算究竟用了index seek,index scan还是table scan?

从系统上来讲,你的计算可能会大量用到tempdb,你的tempdb用的什么raid configuration? 你的data 和transaction log又是放到什么raid level 上?

你的system memory是多少? is the box dedicated to sql server or shared with other applications?
尽量减少SQL次数！ -kkkkkkkk(Toronto123); 2002-12-3 {381} (#892378@0)
1.使用bulk fetch/update/insert来减少SQL次数,或优化算法以减少sql次数。
2.使用cost-based optimization
3.把同一table分布在不同disks
4.将access平率高的master tables cache到内存
5.优化检查每一个SQL语句
6.系统和DB级优化

我们使用的是C++/ProC，由于设计者充分考虑并行计算，batch有多个线程（意味多个数据库连接），plus data cache(应用程序级） and bulk fetch/update/insert,效果非常好。

great -maggie1001(学习ing&找工ing&); 2002-12-5 (#895670@0)

先多谢Raymond, 不过, 还要继续问 -qzh(胡子); 2002-12-3 {1093} (#891626@0)

本文发表在 rolia.net 枫下论坛看来我的问题问的不太好, 再试着说清楚点:

硬件应该没问题:
DELL PowerEdge 8450 4XCPU 4G Memory (SQL 2000 专用)
TempTable, Log, Database File, Database Excuteble File 都放在不同的RAID 上, RAID Level 也应该没问题. 硬件方面除了内存可以再加点, 已没什么好改的了.

软件: ＳＱＬ　ＳＥＲＶＥＲ　２０００
我把问题简化来问吧: 在一个数据库里有-->
表一: 客户信息
------------------------------------------------------------
ID NAME ADDRESS ...等等...

3万 -- 4万个记录(也就是客户了)

表二: 交易记录(也就流水帐)
-------------------------------------------------------------
ID DATE TIME BUY/SELL 数量股票ID ...等等...

共有200万个记录(他们买卖越多, 我们赚的也越多)

表三: 余额
---------------------------------------------------------------
ID 余额截止日期 ... 等等...

计算出的结果放在(表三)里, 当然是算余额了!

这样的问题, 如有100万客户, 2000万以上条交易. 有没有可能在五六个小时内算出来. 我指的是如何优化T-SQL和INDEX, 或者其他方面, 总之, 不是硬件方面. (其实, 硬件是我管的).

实际上要算的比我说的复杂, 但实质上一样.

恳请各位高手们都来说两句!!!更多精彩文章及讨论，请光临枫下论坛 rolia.net

增量余额法 -ra_95(小人-拒绝老去); 2002-12-3 {53} (#891674@0)
Amount.now = Amount.yesterday + MoneyFlowing.today :)
在写数据库忙的时侯，很多记录都处于锁定状态。如果计算精度允许，可以降低读取隔离等级。或把数据库备复制到另一台计算机上去统计. -miketany(MIKE老狼); 2002-12-3 (#891715@0)
一点看法. -yangn(Raymond); 2002-12-3 {591} (#891722@0)
仅仅凭你介绍的情况,很难回答你的计算是否可以在5,6个小时内完成.

从优化上考虑,
第一个table相对static,大量的inserting,updating集中在第二个table,第三个table is for one-time daily calculation. 可以考虑
(1)在第二个table加上presummarization column to hold the $amount the specifc ID spent on each stock ID to decrease the level of calculation.
(2)I don't know what kind of index you created for the third table. before the calculation, drop the index and recreate it/them after calculating.
(3) which RAID did you configure for tempdb ?RAID 5, RAID 1 or RAID 0+1? RAID 1 is the best of them.
不熟悉SQL Server,但听说SQL SERVER和SYBASE想像。我原来公司用SYBASE处理百万累次你这样的记录也就a couple hours.我试着用SYBASE的概念改一下 -ivy_sh(纸蝶); 2002-12-3 {365} (#891747@0)
1)在做大批量时我想你不能采用日间的client/server结构，而应是batch。在SYBASE中用store procedure做batch的工作，减少中间的input/output。所以没有前面DX说的ODBC的问题。T-SQL是否就是transaction sql,只适合于日间交易。
2)综合考虑一下index.我猜想你第三个表应属于select 旧余额，再update为新余额(根据流水帐的交易金额)，所以不能没有INDEX，但INDEX又不能太多，只设ID为INDEX即可。
高手们好, 我再来点说明 -qzh(胡子); 2002-12-3 {902} (#891887@0)
1. 200万条记录都是"增量"
2. 这些增量在计算余额时不会被锁, 因为这时候这部分没有写操作(修改). 当然"表二"还是有写操作的.
3. RAID 如 Raymond 说的: TEMPTABLE -- RAID 1; LOG -- RAID 0; DATABAE -- RAID 10
4. "表三" 只是存放余额, 如果只看这个简化的问题, 不存在INDEX 的问题. "余额"的意思是: 每个客户到截止日期的各股票/基金持有量. 比如:
ID 股票ID 余额截止日期(可以不要)
------------------------------------------------------------------------------------------
100001 MS 1000 20021130
100001 NT 500 20021130
100002 ATT 1200 20021130
100002 GE 300 20021130
...........
算出的"表三"应该如此.

5. 表二的INDEX 是: ID + 股票ID no cluster

I guess that the stored procedure using for caculation must use the cursor, which is row by row, couple of hours is normal. Try to look the codes of the stored procedure. -greenspring(春天); 2002-12-4 (#894746@0)
Thanks -qzh(胡子); 2002-12-5 (#895610@0)

Try to use name (???) ( Simular one with Pro-C in Oracle), coding will be hard than stored procedure, but performance should be fast than SP. -ling7199(Michaell); 2002-12-6 {272} (#896861@0)

Try using the C, sorry forget the name for SQL SERVER, for oracle is Pro-C, The stence structure is add a pro-fix before the normal select statement. 'EXEC...', If will fast u process and compute but no help for your CRUP transation( This one need depend on your index).

Another things (Commit statment) -ling7199(Michaell); 2002-12-6 {168} (#896871@0)

How much records do u use during the program? Or only one commit statment. If you commit for every 1000 records ( example), need to find a logic to cover the fail case.

Use trigger on the 2nd table, for the insert operation. -sailor(野苹果); 2002-12-6 {228} (#897121@0)

Use trigger on the 2nd table, for the insert operation.

Everytime a new record is inserted into the 2nd, the related record in 3nd is updated or created.

Therefore there's no need to spend hours for the daily calculation.

No, this is not a good idea. The usage of trigger will bring much overheading to the day time transaction and therefore degrade the performance of the system. -yangn(Raymond); 2002-12-6 (#897164@0)

It depends on many factors, e.g. how the trigger scripts are written, how the tables are designed, the number of records. -sailor(野苹果); 2002-12-6 {165} (#897252@0)
By studying this specific case, I think maybe he can try trigger.

Of course, before any offical changes, he should measure the performance of the new solutions.

No matter how the triggers are written, everytime when a new record is added to the second table there is always some overheading because that kind of trigger will be launched before the current DML is finished. -yangn(Raymond); 2002-12-6 {829} (#897307@0)
I don't know how many updating and inserting occur on the second table. But since that is the main table for the daily transaction the trigger behind each insert will impact not only the individual DML but also each transaction including those DMLs and the locking will be the big issues to slow down the processing time for the entire system.

The daily calculation is just after hour task ., which processing speed might be improved with the implementation of the trigger. But the daily production transaction will be impacted greatly. It is a common sense that the production efficiency should be always the most important point to be considered.

The alternative solution for this infrastructure is to establish a separate datamart box to do the daily summarization after replicating the data from the production box

@Calgary

高手们好, 我再来点说明

Replies, comments and Discussions:

More Topics